* Re: net: Add network priority cgroup
From: Dave Taht @ 2011-11-14 12:32 UTC (permalink / raw)
To: Neil Horman; +Cc: netdev, John Fastabend, Robert Love, David S. Miller
In-Reply-To: <20111114114700.GA27284@hmsreliant.think-freely.org>
On Mon, Nov 14, 2011 at 12:47 PM, Neil Horman <nhorman@tuxdriver.com> wrote:
> On Wed, Nov 09, 2011 at 02:57:33PM -0500, Neil Horman wrote:
>> Data Center Bridging environments are currently somewhat limited in their
>> ability to provide a general mechanism for controlling traffic priority.
>> Specifically they are unable to administratively control the priority at which
>> various types of network traffic are sent.
>>
>> Currently, the only ways to set the priority of a network buffer are:
>>
>> 1) Through the use of the SO_PRIORITY socket option
>> 2) By using low level hooks, like a tc action
>>
>> (1) is difficult from an administrative perspective because it requires that the
>> application to be coded to not just assume the default priority is sufficient,
>> and must expose an administrative interface to allow priority adjustment. Such
>> a solution is not scalable in a DCB environment
>>
>> (2) is also difficult, as it requires constant administrative oversight of
>> applications so as to build appropriate rules to match traffic belonging to
>> various classes, so that priority can be appropriately set. It is further
>> limiting when DCB enabled hardware is in use, due to the fact that tc rules are
>> only run after a root qdisc has been selected (DCB enabled hardware may reserve
>> hw queues for various traffic classes and needs the priority to be set prior to
>> selecting the root qdisc)
>>
>>
>> I've discussed various solutions with John Fastabend, and we saw a cgroup as
>> being a good general solution to this problem. The network priority cgroup
>> allows for a per-interface priority map to be built per cgroup. Any traffic
>> originating from an application in a cgroup, that does not explicitly set its
>> priority with SO_PRIORITY will have its priority assigned to the value
>> designated for that group on that interface. This allows a user space daemon,
>> when conducting LLDP negotiation with a DCB enabled peer to create a cgroup
>> based on the APP_TLV value received and administratively assign applications to
>> that priority using the existing cgroup utility infrastructure.
>>
>> Tested by John and myself, with good results
>>
>> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
>> CC: John Fastabend <john.r.fastabend@intel.com>
>> CC: Robert Love <robert.w.love@intel.com>
>> CC: "David S. Miller" <davem@davemloft.net>
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>
> Bump, any other thoughts here? Dave T. has some reasonable thoughts regarding
> the use of skb->priority, but IMO they really seem orthogonal to the purpose of
> this change. Any other reviews would be welcome.
Well, in part I've been playing catchup in the hope that lldp and
openlldp and/or this dcb netlink layer that I don't know anything
about (pointers please?) could help somehow to resolve the semantic
mess skb->priority has become in the first place.
I liked what was described here.
"What if we did at least carve out the DCB functionality away from
skb->priority? Since, AIUI, we're only concerning ourselves with
locally generated traffic here, we're talking
about skbs that have a socket attached to them. We could, instead of indexing
the prio_tc_map with skb->priority, we could index it with
skb->dev->priomap[skb->sk->prioidx] (as provided by this patch). The cgroup
then could be, instead of a strict priority cgroup, a queue_selector cgroup (or
something more appropriately named), and we don't have to touch skb->priority at
all. I'd really rather not start down that road until I got more opinions and
consensus on that, but it seems like a pretty good solution, one that would
allow hardware queue selection in systems that use things like DCB to co-exist
with software queueing features."
The piece that still kind of bothered me about the original proposal
(and perhaps this one) was that setting SO_PRIORITY in an app means
'give my packets more mojo'.
Taking something that took unprioritized packets and assigned them and
*them only* to a hardware queue struck me as possibly deprioritizing
the 'more mojo wanted' packets in the app(s), as they would end up in
some other, possibly overloaded, hardware queue.
So a cgroup that moves all of the packets from an application into a
given hardware queue, and then gets scheduled normally according to
skb->priority and friends (software queue, default of pfifo_fast,
etc), seems to make some sense to me. (I wouldn't mind if we had
abstractions for software queues, too, like, I need a software queue
with these properties, find me a place for it on the hardware - but
I'm dreaming)
One open question is where do packets generated from other subsystems
end up, if you are using a cgroup for the app? arp, dns, etc?
So to rephrase your original description from this:
>> Any traffic originating from an application in a cgroup, that does not explicitly set its
>> priority with SO_PRIORITY will have its priority assigned to the value
>> designated for that group on that interface. This allows a user space daemon,
>> when conducting LLDP negotiation with a DCB enabled peer to create a cgroup
>> based on the APP_TLV value received and administratively assign applications to
>> that priority using the existing cgroup utility infrastructure.
> John, Robert, if you're supportive of these changes, some Acks would be
> appreciated.
To this:
"Any traffic originating from an application in a cgroup, will have
its hardware queue assigned to the value designated for that group on
that interface. This allows a user space daemon, when conducting LLDP
negotiation with a DCB enabled peer to create a cgroup based on the
APP_TLV value received and administratively assign applications to
that hardware queue using the existing cgroup utility infrastructure."
Assuming we're on the same page here, what the heck is APP_TLV?
> John, Robert, if you're supportive of these changes, some Acks would be
> appreciated.
>
>
> Regards
> Neil
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
Dave Täht
SKYPE: davetaht
US Tel: 1-239-829-5608
FR Tel: 0638645374
http://www.bufferbloat.net
^ permalink raw reply
* Re: [RFC] kvm tools: Implement multiple VQ for virtio-net
From: Pekka Enberg @ 2011-11-14 12:25 UTC (permalink / raw)
To: Asias He
Cc: Sasha Levin, Michael S. Tsirkin, kvm, mingo, gorcunov,
Krishna Kumar, Rusty Russell, virtualization, netdev
In-Reply-To: <4EC07729.3050303@gmail.com>
On Mon, Nov 14, 2011 at 4:04 AM, Asias He <asias.hejun@gmail.com> wrote:
> Why both the bandwidth and latency performance are dropping so dramatically
> with multiple VQ?
What's the expected benefit from multiple VQs i.e. why are doing the
patches Sasha?
^ permalink raw reply
* Re: net: Add network priority cgroup
From: Neil Horman @ 2011-11-14 11:47 UTC (permalink / raw)
To: netdev; +Cc: John Fastabend, Robert Love, David S. Miller
In-Reply-To: <1320868655-32592-1-git-send-email-nhorman@tuxdriver.com>
On Wed, Nov 09, 2011 at 02:57:33PM -0500, Neil Horman wrote:
> Data Center Bridging environments are currently somewhat limited in their
> ability to provide a general mechanism for controlling traffic priority.
> Specifically they are unable to administratively control the priority at which
> various types of network traffic are sent.
>
> Currently, the only ways to set the priority of a network buffer are:
>
> 1) Through the use of the SO_PRIORITY socket option
> 2) By using low level hooks, like a tc action
>
> (1) is difficult from an administrative perspective because it requires that the
> application to be coded to not just assume the default priority is sufficient,
> and must expose an administrative interface to allow priority adjustment. Such
> a solution is not scalable in a DCB environment
>
> (2) is also difficult, as it requires constant administrative oversight of
> applications so as to build appropriate rules to match traffic belonging to
> various classes, so that priority can be appropriately set. It is further
> limiting when DCB enabled hardware is in use, due to the fact that tc rules are
> only run after a root qdisc has been selected (DCB enabled hardware may reserve
> hw queues for various traffic classes and needs the priority to be set prior to
> selecting the root qdisc)
>
>
> I've discussed various solutions with John Fastabend, and we saw a cgroup as
> being a good general solution to this problem. The network priority cgroup
> allows for a per-interface priority map to be built per cgroup. Any traffic
> originating from an application in a cgroup, that does not explicitly set its
> priority with SO_PRIORITY will have its priority assigned to the value
> designated for that group on that interface. This allows a user space daemon,
> when conducting LLDP negotiation with a DCB enabled peer to create a cgroup
> based on the APP_TLV value received and administratively assign applications to
> that priority using the existing cgroup utility infrastructure.
>
> Tested by John and myself, with good results
>
> Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
> CC: John Fastabend <john.r.fastabend@intel.com>
> CC: Robert Love <robert.w.love@intel.com>
> CC: "David S. Miller" <davem@davemloft.net>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
Bump, any other thoughts here? Dave T. has some reasonable thoughts regarding
the use of skb->priority, but IMO they really seem orthogonal to the purpose of
this change. Any other reviews would be welcome.
John, Robert, if you're supportive of these changes, some Acks would be
appreciated.
Regards
Neil
^ permalink raw reply
* Re: [v2 PATCH 1/2] NETFILTER module xt_hmark new target for HASH based fw
From: Jan Engelhardt @ 2011-11-14 11:38 UTC (permalink / raw)
To: Hans Schillstrom
Cc: Pablo Neira Ayuso, Hans Schillstrom, kaber, netfilter-devel,
netdev
In-Reply-To: <201111141019.43423.hans@schillstrom.com>
On Monday 2011-11-14 10:19, Hans Schillstrom wrote:
>
>On Sunday, November 13, 2011 18:05:28 Pablo Neira Ayuso wrote:
>> BTW, I think you should split xt_HMARK to ipt_HMARK and ip6t_HMARK
>> (see recent Florian Westphal patches regarding reserve lookup for
>> instance).
>>
>> The IPv4 and IPv6 parts for HMARK look so different that I don't think
>> it makes sense to keep them into one single xt_HMARK thing with all
>> those conditional ifdefs for IPV6.
>>
>Ok I'll do that, for some reason a thought it was better with one module.
So do I. The module overhead is so much larger.
^ permalink raw reply
* Re: [LARTC] LARTC mailing list
From: Niccolò Belli @ 2011-11-14 11:24 UTC (permalink / raw)
To: Linux Advanced Routing & Traffic Control project
Cc: João Almeida, linux-new-lists, netfilter,
Linux Networking Developer Mailing List
In-Reply-To: <CAAQ01U7iwOMD9xrq3y_dOwvbmHRJZFU6VDb6QxW3qDa8mZWQnQ@mail.gmail.com>
Il 14/11/2011 12:16, João Almeida ha scritto:
> lartc.org <http://lartc.org> should be updated with decent docs as well...
Do we have access to lartc.org? Because if we have access we can even
keep (and fix) the original mailing list, or at least dump the addresses
database...
Otherwise we will have to make a completely new site too.
Niccolò
--
To unsubscribe from this list: send the line "unsubscribe linux-new-lists" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply
* RE: [PATCH] Phonet: set the pipe handle using setsockopt
From: Hemant-vilas RAMDASI @ 2011-11-14 10:36 UTC (permalink / raw)
To: Rémi Denis-Courmont, netdev@vger.kernel.org,
netdev-owner@vger.kernel.org
In-Reply-To: <2168215.tYVKKNoJjY@hector>
Remi,
> -----Original Message-----
> From: Rémi Denis-Courmont [mailto:remi.denis-courmont@nokia.com]
> Sent: Monday, November 14, 2011 3:00 PM
> To: Hemant-vilas RAMDASI; netdev@vger.kernel.org
> Subject: Re: [PATCH] Phonet: set the pipe handle using setsockopt
> > @@ -180,6 +182,7 @@ static inline __u8 pn_sockaddr_get_resource(const
> struct
> > sockaddr_pn *spn) /* Phonet device ioctl requests */
> > #ifdef __KERNEL__
> > #define SIOCPNGAUTOCONF (SIOCDEVPRIVATE + 0)
> > +#define SIOPNPIPE_ENABLE _IO(SIOCPNGAUTOCONF, 1)
>
> Does this even work? I am not an expert on this, but I would think that
> device-private controls are routed to the network device, not the
> socket. In
> any case, it does not seem right.
>
Yes, it works. The ioctl is routed to per-socket functions.
> > @@ -863,14 +898,31 @@ static int pep_sock_connect(struct sock *sk,
> struct
> > sockaddr *addr, int len) int err;
> > u8 data[4] = { 0 /* sub-blocks */, PAD, PAD, PAD };
> >
> > - pn->pipe_handle = 1; /* anything but INVALID_HANDLE */
> > + if (pn->pipe_handle == PN_PIPE_INVALID_HANDLE)
> > + pn->pipe_handle = 1; /* anything but INVALID_HANDLE */
> > +
> > err = pipe_handler_request(sk, PNS_PEP_CONNECT_REQ,
> > - PN_PIPE_ENABLE, data, 4);
> > - if (err) {
> > - pn->pipe_handle = PN_PIPE_INVALID_HANDLE;
>
> The current backlog functions assume that pipe_handle =
> PN_PIPE_INVALID_HANDLE
> if the socket is not yet connected. That's why the old code would clear
> the
> pipe_handle always on error.
Ok..I think clearing pipe-handle on connect error should take care of this.
User-space need to set the handle again.
>
> So it is not that simple.
>
> > @@ -894,6 +946,16 @@ static int pep_ioctl(struct sock *sk, int cmd,
> unsigned
> > + case SIOPNPIPE_ENABLE:
> > + if (sk->sk_state == TCP_SYN_SENT)
> > + return -EBUSY;
> > + else if (sk->sk_state == TCP_ESTABLISHED)
> > + return -EISCONN;
> > + else
> > + return pep_sock_enable(sk, NULL, 0);
> > + break;
> > }
>
> I strongly suspect insufficient locking here.
Ok..
>
> >
> > return -ENOIOCTLCMD;
> > @@ -959,6 +1021,18 @@ static int pep_setsockopt(struct sock *sk, int
> level,
> > int optname, }
> > goto out_norel;
> >
> > + case PNPIPE_HANDLE:
> > + if ((sk->sk_state == TCP_CLOSE) &&
> > + (val >= 0) && (val < PN_PIPE_INVALID_HANDLE))
> > + pn->pipe_handle = val;
> > + else
> > + err = -EINVAL;
> > + break;
>
> Same problem regarding pipe_handle as above.
I think locking already exists..
> > @@ -994,6 +1068,17 @@ static int pep_getsockopt(struct sock *sk, int
> level,
> > int optname, return -EINVAL;
> > break;
> >
> > + case PNPIPE_ENABLE:
> > + if (sk->sk_state == TCP_ESTABLISHED)
> > + val = 1;
> > + else
> > + val = 0;
> > + break;
>
> Do you still need this read-only option?
Yes.
^ permalink raw reply
* Disabling TCP slow start after idle
From: David Laight @ 2011-11-14 10:16 UTC (permalink / raw)
To: netdev
(oOriginally sent to linux-kernel@, but suggested I report here.)
We have some connections that suffer very badly from the TCP
'slow start' algorithm. These are connections that will
always be local - they may be MAC-Switch-MAC using RGMII
crossover, they might also be connected via an external
switch. In either case the RTT is most likely to be almost
zero, certainly below 1ms.
The traffic is single packets (carrying another protocol)
so we have Nagle disabled and the send and receive sides
run separately. So the traffic is neither bulk, nor
command/response.
This means that there is very rarely any unacked data,
so almost every packet is sent using 'slow start'.
If the external switch drops a packet (they do!) then
slow start stops more packets being sent, and nothing
progresses for about 1.5 seconds by which time there
is a significant amount of traffic queued and, in some
cases, data has to be discarded.
Similar issues happen if the receiving system decides
to defer the ack until a timer tick (instead of
sending one after every second packet). In this case
only 4 packets are sent. (We fixed this one be sending
a software ACK every 4 packets.)
Quite cleary the 'slow start' algorithm just doesn't
work in these cases.
I found this https://lkml.org/lkml/2010/4/9/427
discussion about a socket option to disable slow start.
But it seems that some people are completely against the idea.
I'd have thought that the global option would be more of a
problem - since that will affect ftp connections to remote
hosts where slow start is alomost certainly benefitial.
I'd have thought it would be sensible to allow one (or more)
of the following (either as a sysctl, socket option, or
code change):
1) Disable slow start for the local subnet.
2) Disable slow start for connections with very low RTT.
3) Disable slow start for a minimum period with no traffic
(after the last packet is acked).
I'm not sure of the resolution used by the Linux RTT
calculations. I know NetBSD had a recent set of patches
to fix calculation errors with low RTT because the code
had been written when all RTT were much longer.
David
^ permalink raw reply
* Re: [RFC] kvm tools: Implement multiple VQ for virtio-net
From: Sasha Levin @ 2011-11-14 10:15 UTC (permalink / raw)
To: Asias He
Cc: Michael S. Tsirkin, penberg, kvm, mingo, gorcunov, Krishna Kumar,
Rusty Russell, virtualization, netdev
In-Reply-To: <4EC07729.3050303@gmail.com>
On Mon, 2011-11-14 at 10:04 +0800, Asias He wrote:
> Hi, Shsha
>
> On 11/13/2011 11:00 PM, Sasha Levin wrote:
> > On Sun, 2011-11-13 at 12:24 +0200, Michael S. Tsirkin wrote:
> >> On Sat, Nov 12, 2011 at 12:12:01AM +0200, Sasha Levin wrote:
> >>> This is a patch based on Krishna Kumar's patch series which implements
> >>> multiple VQ support for virtio-net.
> >>>
> >>> The patch was tested with ver3 of the patch.
> >>>
> >>> Cc: Krishna Kumar<krkumar2@in.ibm.com>
> >>> Cc: Michael S. Tsirkin<mst@redhat.com>
> >>> Cc: Rusty Russell<rusty@rustcorp.com.au>
> >>> Cc: virtualization@lists.linux-foundation.org
> >>> Cc: netdev@vger.kernel.org
> >>> Signed-off-by: Sasha Levin<levinsasha928@gmail.com>
> >>
> >> Any performance numbers?
> >
> > I tried finding a box with more than two cores so I could test it on
> > something like that as well.
> >
> >> From what I see this patch causes a performance regression on my 2 core
> > box.
> >
> > I'll send an updated KVM tools patch in a bit as well.
> >
> > Before:
> >
> > # netperf -H 192.168.33.4,ipv4 -t TCP_RR
> > MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
> > to 192.168.33.4 (192.168.33.4) port 0 AF_INET : first burst 0
> > Local /Remote
> > Socket Size Request Resp. Elapsed Trans.
> > Send Recv Size Size Time Rate
> > bytes Bytes bytes bytes secs. per sec
> >
> > 16384 87380 1 1 10.00 11160.63
> > 16384 87380
> >
> > # netperf -H 192.168.33.4,ipv4 -t UDP_RR
> > MIGRATED UDP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
> > to 192.168.33.4 (192.168.33.4) port 0 AF_INET : first burst 0
> > Local /Remote
> > Socket Size Request Resp. Elapsed Trans.
> > Send Recv Size Size Time Rate
> > bytes Bytes bytes bytes secs. per sec
> >
> > 122880 122880 1 1 10.00 12072.64
> > 229376 229376
> >
> > # netperf -H 192.168.33.4,ipv4 -t TCP_STREAM
> > MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> > 192.168.33.4 (192.168.33.4) port 0 AF_INET
> > Recv Send Send
> > Socket Socket Message Elapsed
> > Size Size Size Time Throughput
> > bytes bytes bytes secs. 10^6bits/sec
> >
> > 87380 16384 16384 10.00 4654.50
> >
> > netperf -H 192.168.33.4,ipv4 -t TCP_STREAM -- -m 128
> > MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> > 192.168.33.4 (192.168.33.4) port 0 AF_INET
> > Recv Send Send
> > Socket Socket Message Elapsed
> > Size Size Size Time Throughput
> > bytes bytes bytes secs. 10^6bits/sec
> >
> > 87380 16384 128 10.00 635.45
> >
> > # netperf -H 192.168.33.4,ipv4 -t UDP_STREAM
> > MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> > 192.168.33.4 (192.168.33.4) port 0 AF_INET
> > Socket Message Elapsed Messages
> > Size Size Time Okay Errors Throughput
> > bytes bytes secs # # 10^6bits/sec
> >
> > 122880 65507 10.00 113894 0 5968.54
> > 229376 10.00 89373 4683.54
> >
> > # netperf -H 192.168.33.4,ipv4 -t UDP_STREAM -- -m 128
> > MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> > 192.168.33.4 (192.168.33.4) port 0 AF_INET
> > Socket Message Elapsed Messages
> > Size Size Time Okay Errors Throughput
> > bytes bytes secs # # 10^6bits/sec
> >
> > 122880 128 10.00 550634 0 56.38
> > 229376 10.00 398786 40.84
> >
> >
> > After:
> >
> > # netperf -H 192.168.33.4,ipv4 -t TCP_RR
> > MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
> > to 192.168.33.4 (192.168.33.4) port 0 AF_INET : first burst 0
> > Local /Remote
> > Socket Size Request Resp. Elapsed Trans.
> > Send Recv Size Size Time Rate
> > bytes Bytes bytes bytes secs. per sec
> >
> > 16384 87380 1 1 10.00 8952.47
> > 16384 87380
> >
> > # netperf -H 192.168.33.4,ipv4 -t UDP_RR
> > MIGRATED UDP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
> > to 192.168.33.4 (192.168.33.4) port 0 AF_INET : first burst 0
> > Local /Remote
> > Socket Size Request Resp. Elapsed Trans.
> > Send Recv Size Size Time Rate
> > bytes Bytes bytes bytes secs. per sec
> >
> > 122880 122880 1 1 10.00 9534.52
> > 229376 229376
> >
> > # netperf -H 192.168.33.4,ipv4 -t TCP_STREAM
> > MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> > 192.168.33.4 (192.168.33.4) port 0 AF_INET
> > Recv Send Send
> > Socket Socket Message Elapsed
> > Size Size Size Time Throughput
> > bytes bytes bytes secs. 10^6bits/sec
> >
> > 87380 16384 16384 10.13 2278.23
> >
> > # netperf -H 192.168.33.4,ipv4 -t TCP_STREAM -- -m 128
> > MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> > 192.168.33.4 (192.168.33.4) port 0 AF_INET
> > Recv Send Send
> > Socket Socket Message Elapsed
> > Size Size Size Time Throughput
> > bytes bytes bytes secs. 10^6bits/sec
> >
> > 87380 16384 128 10.00 623.27
> >
> > # netperf -H 192.168.33.4,ipv4 -t UDP_STREAM
> > MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> > 192.168.33.4 (192.168.33.4) port 0 AF_INET
> > Socket Message Elapsed Messages
> > Size Size Time Okay Errors Throughput
> > bytes bytes secs # # 10^6bits/sec
> >
> > 122880 65507 10.00 136930 0 7175.72
> > 229376 10.00 16726 876.51
> >
> > # netperf -H 192.168.33.4,ipv4 -t UDP_STREAM -- -m 128
> > MIGRATED UDP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> > 192.168.33.4 (192.168.33.4) port 0 AF_INET
> > Socket Message Elapsed Messages
> > Size Size Time Okay Errors Throughput
> > bytes bytes secs # # 10^6bits/sec
> >
> > 122880 128 10.00 982492 0 100.61
> > 229376 10.00 249597 25.56
> >
>
> Why both the bandwidth and latency performance are dropping so
> dramatically with multiple VQ?
It looks like theres no hash sync between host and guest, which makes
the RX VQ change for every packet. This is my guess.
--
Sasha.
^ permalink raw reply
* Re: [PATCH] brcmsmac: Use current logging styles
From: Arend van Spriel @ 2011-11-14 10:13 UTC (permalink / raw)
To: Joe Perches; +Cc: John W. Linville, linux-wireless, netdev, LKML
In-Reply-To: <1321213264.15834.4.camel@Joe-Laptop>
On 11/13/2011 08:41 PM, Joe Perches wrote:
> Add and use pr_fmt and pr_<level>
> Remove useless double parentheses from macros.
> Remove function names from format strings, add to pr_debug use.
> Coalesce formats.
> Remove uncompileable undeclared variable in a DMA_NONE use.
>
> Signed-off-by: Joe Perches <joe@perches.com>
> ---
> drivers/net/wireless/brcm80211/brcmsmac/aiutils.c | 14 +-
> drivers/net/wireless/brcm80211/brcmsmac/dma.c | 145 ++++++++++-----------
> 2 files changed, 80 insertions(+), 79 deletions(-)
Thanks for the service.
Acked-by: Arend van Spriel <arend@broadcom.com>
Gr. AvS
^ permalink raw reply
* Re: [PATCH 2/4] ipv4: Update pmtu informations on inetpeer only for output routes
From: Steffen Klassert @ 2011-11-14 10:12 UTC (permalink / raw)
To: David Miller; +Cc: netdev
In-Reply-To: <20111108.143630.106539981030509701.davem@davemloft.net>
On Tue, Nov 08, 2011 at 02:36:30PM -0500, David Miller wrote:
>
> What I think you can do to solve this problem is explicitly use
> dst->ops->default_mtu() in ip_forward() instead of dst_mtu().
>
Unfortunately, that's not sufficient. We need to do an input route
check in ip_skb_dst_mtu() at least to get plain ipv4 to work,
IPsec is still broken then. So I assume most (all?) callers of
dst_mtu() don't want to get the cached PMTU for input routes on ipv4.
So for the moment I'm thinking about adding an ip_dst_mtu()
function that returns dst->ops->default_mtu() for input routes
and dst_mtu() for output routes. Then we could convert the
dst_mtu() users in net/ipv4/ over to this one.
^ permalink raw reply
* [PATCH RESUBMIT3 net-next 1/2] IPv6 routing, NLM_F_* flag support: warn if new route is created without NLM_F_CREATE
From: Matti Vaittinen @ 2011-11-14 10:14 UTC (permalink / raw)
To: netdev; +Cc: davem
The support for NLM_F_* flags at IPv6 routing requests.
Warn if NLM_F_CREATE flag is not defined for RTM_NEWROUTE request,
creating new table. Later NLM_F_CREATE may be required for
new route creation.
Patch created against linux-3.2-rc1
Signed-off-by: Matti Vaittinen <Mazziesaccount@gmail.com>
--
diff -uNr Linux-3.2-rc1.orig/net/ipv6/route.c Linux-3.2-rc1.new/net/ipv6/route.c
--- Linux-3.2-rc1.orig/net/ipv6/route.c 2011-11-10 08:44:18.000000000 +0200
+++ Linux-3.2-rc1.new/net/ipv6/route.c 2011-11-10 08:46:15.000000000 +0200
@@ -1230,9 +1230,18 @@
if (cfg->fc_metric == 0)
cfg->fc_metric = IP6_RT_PRIO_USER;
- table = fib6_new_table(net, cfg->fc_table);
+ err = -ENOBUFS;
+ if (NULL != cfg->fc_nlinfo.nlh &&
+ !(cfg->fc_nlinfo.nlh->nlmsg_flags&NLM_F_CREATE)) {
+ table = fib6_get_table(net, cfg->fc_table);
+ if (table == NULL) {
+ printk(KERN_WARNING "IPv6: NLM_F_CREATE should be specified when creating new route\n");
+ table = fib6_new_table(net, cfg->fc_table);
+ }
+ } else {
+ table = fib6_new_table(net, cfg->fc_table);
+ }
if (table == NULL) {
- err = -ENOBUFS;
goto out;
}
^ permalink raw reply
* [PATCH RESUBMIT3 net-next 2/2] IPv6 routing, NLM_F_* flag support: REPLACE and EXCL flags support, warn about missing CREATE flag
From: Matti Vaittinen @ 2011-11-14 10:15 UTC (permalink / raw)
To: davem; +Cc: netdev
The support for NLM_F_* flags at IPv6 routing requests.
If NLM_F_CREATE flag is not defined for RTM_NEWROUTE request,
warning is printed, but no error is returned. Instead new route is
added. Later NLM_F_CREATE may be required for
new route creation.
Exception is when NLM_F_REPLACE flag is given without NLM_F_CREATE, and
no matching route is found. In this case it should be safe to assume
that the request issuer is familiar with NLM_F_* flags, and does really
not want route to be created.
Specifying NLM_F_REPLACE flag will now make the kernel to search for
matching route, and replace it with new one. If no route is found and
NLM_F_CREATE is specified as well, then new route is created.
Also, specifying NLM_F_EXCL will yield returning of error if matching
route is found.
Patch created against linux-3.2-rc1
Signed-off-by: Matti Vaittinen <Mazziesaccount@gmail.com>
--
diff -uNr Linux-3.2-rc1.orig/net/ipv6/ip6_fib.c Linux-3.2-rc1.new/net/ipv6/ip6_fib.c
--- Linux-3.2-rc1.orig/net/ipv6/ip6_fib.c 2011-11-10 08:44:18.000000000 +0200
+++ Linux-3.2-rc1.new/net/ipv6/ip6_fib.c 2011-11-14 11:47:37.000000000 +0200
@@ -425,7 +425,8 @@
static struct fib6_node * fib6_add_1(struct fib6_node *root, void *addr,
int addrlen, int plen,
- int offset)
+ int offset, int allow_create,
+ int replace_required)
{
struct fib6_node *fn, *in, *ln;
struct fib6_node *pn = NULL;
@@ -447,8 +448,12 @@
* Prefix match
*/
if (plen < fn->fn_bit ||
- !ipv6_prefix_equal(&key->addr, addr, fn->fn_bit))
+ !ipv6_prefix_equal(&key->addr, addr, fn->fn_bit)) {
+ if (!allow_create)
+ printk(KERN_WARNING
+ "IPv6: NLM_F_CREATE should be set when creating new route\n");
goto insert_above;
+ }
/*
* Exact match ?
@@ -477,10 +482,26 @@
fn = dir ? fn->right: fn->left;
} while (fn);
+ if (replace_required && !allow_create) {
+ /* We should not create new node because
+ * NLM_F_REPLACE was specified without NLM_F_CREATE
+ * I assume it is safe to require NLM_F_CREATE when
+ * REPLACE flag is used! Later we may want to remove the
+ * check for replace_required, because according
+ * to netlink specification, NLM_F_CREATE
+ * MUST be specified if new route is created.
+ * That would keep IPv6 consistent with IPv4
+ */
+ printk(KERN_WARNING
+ "IPv6: NLM_F_CREATE should be set when creating new route - ignoring request\n");
+ return ERR_PTR(-ENOENT);
+ }
/*
* We walked to the bottom of tree.
* Create new leaf node without children.
*/
+ if (!allow_create)
+ printk(KERN_WARNING "IPv6: NLM_F_CREATE should be set when creating new route\n");
ln = node_alloc();
@@ -614,6 +635,12 @@
{
struct rt6_info *iter = NULL;
struct rt6_info **ins;
+ int replace = (NULL != info &&
+ NULL != info->nlh &&
+ (info->nlh->nlmsg_flags&NLM_F_REPLACE));
+ int add = ((NULL == info || NULL == info->nlh) ||
+ (info->nlh->nlmsg_flags&NLM_F_CREATE));
+ int found = 0;
ins = &fn->leaf;
@@ -626,6 +653,13 @@
/*
* Same priority level
*/
+ if (NULL != info->nlh &&
+ (info->nlh->nlmsg_flags&NLM_F_EXCL))
+ return -EEXIST;
+ if (replace) {
+ found++;
+ break;
+ }
if (iter->rt6i_dev == rt->rt6i_dev &&
iter->rt6i_idev == rt->rt6i_idev &&
@@ -655,17 +689,40 @@
/*
* insert node
*/
+ if (!replace) {
+ if (!add)
+ printk(KERN_WARNING "IPv6: NLM_F_CREATE should be set when creating new route\n");
+
+add:
+ rt->dst.rt6_next = iter;
+ *ins = rt;
+ rt->rt6i_node = fn;
+ atomic_inc(&rt->rt6i_ref);
+ inet6_rt_notify(RTM_NEWROUTE, rt, info);
+ info->nl_net->ipv6.rt6_stats->fib_rt_entries++;
+
+ if ((fn->fn_flags & RTN_RTINFO) == 0) {
+ info->nl_net->ipv6.rt6_stats->fib_route_nodes++;
+ fn->fn_flags |= RTN_RTINFO;
+ }
- rt->dst.rt6_next = iter;
- *ins = rt;
- rt->rt6i_node = fn;
- atomic_inc(&rt->rt6i_ref);
- inet6_rt_notify(RTM_NEWROUTE, rt, info);
- info->nl_net->ipv6.rt6_stats->fib_rt_entries++;
-
- if ((fn->fn_flags & RTN_RTINFO) == 0) {
- info->nl_net->ipv6.rt6_stats->fib_route_nodes++;
- fn->fn_flags |= RTN_RTINFO;
+ } else {
+ if (!found) {
+ if (add)
+ goto add;
+ printk(KERN_WARNING "IPv6: NLM_F_REPLACE set, but no existing node found!\n");
+ return -ENOENT;
+ }
+ *ins = rt;
+ rt->rt6i_node = fn;
+ rt->dst.rt6_next = iter->dst.rt6_next;
+ atomic_inc(&rt->rt6i_ref);
+ inet6_rt_notify(RTM_NEWROUTE, rt, info);
+ rt6_release(iter);
+ if ((fn->fn_flags & RTN_RTINFO) == 0) {
+ info->nl_net->ipv6.rt6_stats->fib_route_nodes++;
+ fn->fn_flags |= RTN_RTINFO;
+ }
}
return 0;
@@ -696,9 +753,25 @@
{
struct fib6_node *fn, *pn = NULL;
int err = -ENOMEM;
+ int allow_create = 1;
+ int replace_required = 0;
+ if (NULL != info && NULL != info->nlh) {
+ if (!(info->nlh->nlmsg_flags&NLM_F_CREATE))
+ allow_create = 0;
+ if ((info->nlh->nlmsg_flags&NLM_F_REPLACE))
+ replace_required = 1;
+ }
+ if (!allow_create && !replace_required)
+ printk(KERN_WARNING "IPv6: RTM_NEWROUTE with no NLM_F_CREATE or NLM_F_REPLACE\n");
fn = fib6_add_1(root, &rt->rt6i_dst.addr, sizeof(struct in6_addr),
- rt->rt6i_dst.plen, offsetof(struct rt6_info, rt6i_dst));
+ rt->rt6i_dst.plen, offsetof(struct rt6_info, rt6i_dst),
+ allow_create, replace_required);
+
+ if (IS_ERR(fn)) {
+ err = PTR_ERR(fn);
+ fn = NULL;
+ }
if (fn == NULL)
goto out;
@@ -736,7 +809,8 @@
sn = fib6_add_1(sfn, &rt->rt6i_src.addr,
sizeof(struct in6_addr), rt->rt6i_src.plen,
- offsetof(struct rt6_info, rt6i_src));
+ offsetof(struct rt6_info, rt6i_src),
+ allow_create, replace_required);
if (sn == NULL) {
/* If it is failed, discard just allocated
@@ -753,8 +827,13 @@
} else {
sn = fib6_add_1(fn->subtree, &rt->rt6i_src.addr,
sizeof(struct in6_addr), rt->rt6i_src.plen,
- offsetof(struct rt6_info, rt6i_src));
+ offsetof(struct rt6_info, rt6i_src),
+ allow_create, replace_required);
+ if (IS_ERR(sn)) {
+ err = PTR_ERR(sn);
+ sn = NULL;
+ }
if (sn == NULL)
goto st_failure;
}
^ permalink raw reply
* Re: [PATCH] Phonet: set the pipe handle using setsockopt
From: Rémi Denis-Courmont @ 2011-11-14 9:29 UTC (permalink / raw)
To: ext Hemant Vilas RAMDASI, netdev
In-Reply-To: <1321257210-17200-1-git-send-email-hemant.ramdasi@stericsson.com>
Le Lundi 14 Novembre 2011 13:23:30 ext Hemant Vilas RAMDASI a écrit :
> From: Dinesh Kumar Sharma <dinesh.sharma@stericsson.com>
>
> This provides flexibility to set the pipe handle
> using setsockopt. The pipe can be enabled (if disabled) later
> using ioctl.
>
> Signed-off-by: Hemant Ramdasi <hemant.ramdasi@stericsson.com>
> Signed-off-by: Dinesh Kumar Sharma <dinesh.sharma@stericsson.com>
> ---
> include/linux/phonet.h | 3 +
> net/phonet/pep.c | 105
> +++++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 98
> insertions(+), 10 deletions(-)
>
> diff --git a/include/linux/phonet.h b/include/linux/phonet.h
> index 6fb1384..4c00551 100644
> --- a/include/linux/phonet.h
> +++ b/include/linux/phonet.h
> @@ -37,6 +37,8 @@
> #define PNPIPE_ENCAP 1
> #define PNPIPE_IFINDEX 2
> #define PNPIPE_HANDLE 3
> +#define PNPIPE_ENABLE 4
> +#define PNPIPE_INITSTATE 5
>
> #define PNADDR_ANY 0
> #define PNADDR_BROADCAST 0xFC
> @@ -180,6 +182,7 @@ static inline __u8 pn_sockaddr_get_resource(const struct
> sockaddr_pn *spn) /* Phonet device ioctl requests */
> #ifdef __KERNEL__
> #define SIOCPNGAUTOCONF (SIOCDEVPRIVATE + 0)
> +#define SIOPNPIPE_ENABLE _IO(SIOCPNGAUTOCONF, 1)
Does this even work? I am not an expert on this, but I would think that
device-private controls are routed to the network device, not the socket. In
any case, it does not seem right.
>
> struct if_phonet_autoconf {
> uint8_t device;
> diff --git a/net/phonet/pep.c b/net/phonet/pep.c
> index f17fd84..48339b9 100644
> --- a/net/phonet/pep.c
> +++ b/net/phonet/pep.c
> @@ -533,6 +533,29 @@ static int pep_connresp_rcv(struct sock *sk, struct
> sk_buff *skb) return pipe_handler_send_created_ind(sk);
> }
>
> +static int pep_enableresp_rcv(struct sock *sk, struct sk_buff *skb)
> +{
> + struct pnpipehdr *hdr = pnp_hdr(skb);
> +
> + if (hdr->error_code != PN_PIPE_NO_ERROR)
> + return -ECONNREFUSED;
> +
> + return pep_indicate(sk, PNS_PIPE_ENABLED_IND, 0 /* sub-blocks */,
> + NULL, 0, GFP_ATOMIC);
> +
> +}
> +
> +static void pipe_start_flow_control(struct sock *sk)
> +{
> + struct pep_sock *pn = pep_sk(sk);
> +
> + if (!pn_flow_safe(pn->tx_fc)) {
> + atomic_set(&pn->tx_credits, 1);
> + sk->sk_write_space(sk);
> + }
> + pipe_grant_credits(sk, GFP_ATOMIC);
> +}
> +
> /* Queue an skb to an actively connected sock.
> * Socket lock must be held. */
> static int pipe_handler_do_rcv(struct sock *sk, struct sk_buff *skb)
> @@ -578,13 +601,25 @@ static int pipe_handler_do_rcv(struct sock *sk, struct
> sk_buff *skb) sk->sk_state = TCP_CLOSE_WAIT;
> break;
> }
> + if (pn->init_enable == PN_PIPE_DISABLE)
> + sk->sk_state = TCP_SYN_RECV;
> + else {
> + sk->sk_state = TCP_ESTABLISHED;
> + pipe_start_flow_control(sk);
> + }
> + break;
>
> - sk->sk_state = TCP_ESTABLISHED;
> - if (!pn_flow_safe(pn->tx_fc)) {
> - atomic_set(&pn->tx_credits, 1);
> - sk->sk_write_space(sk);
> + case PNS_PEP_ENABLE_RESP:
> + if (sk->sk_state != TCP_SYN_SENT)
> + break;
> +
> + if (pep_enableresp_rcv(sk, skb)) {
> + sk->sk_state = TCP_CLOSE_WAIT;
> + break;
> }
> - pipe_grant_credits(sk, GFP_ATOMIC);
> +
> + sk->sk_state = TCP_ESTABLISHED;
> + pipe_start_flow_control(sk);
> break;
>
> case PNS_PEP_DISCONNECT_RESP:
> @@ -863,14 +898,31 @@ static int pep_sock_connect(struct sock *sk, struct
> sockaddr *addr, int len) int err;
> u8 data[4] = { 0 /* sub-blocks */, PAD, PAD, PAD };
>
> - pn->pipe_handle = 1; /* anything but INVALID_HANDLE */
> + if (pn->pipe_handle == PN_PIPE_INVALID_HANDLE)
> + pn->pipe_handle = 1; /* anything but INVALID_HANDLE */
> +
> err = pipe_handler_request(sk, PNS_PEP_CONNECT_REQ,
> - PN_PIPE_ENABLE, data, 4);
> - if (err) {
> - pn->pipe_handle = PN_PIPE_INVALID_HANDLE;
The current backlog functions assume that pipe_handle = PN_PIPE_INVALID_HANDLE
if the socket is not yet connected. That's why the old code would clear the
pipe_handle always on error.
So it is not that simple.
> + pn->init_enable, data, 4);
> + if (err)
> return err;
> - }
> +
> + sk->sk_state = TCP_SYN_SENT;
> +
> + return 0;
> +}
> +
> +static int pep_sock_enable(struct sock *sk, struct sockaddr *addr, int len)
> +{
> + int err;
> +
> + err = pipe_handler_request(sk, PNS_PEP_ENABLE_REQ, PAD,
> + NULL, 0);
> +
> + if (err)
> + return err;
> +
> sk->sk_state = TCP_SYN_SENT;
> +
> return 0;
> }
>
> @@ -894,6 +946,16 @@ static int pep_ioctl(struct sock *sk, int cmd, unsigned
> long arg) answ = 0;
> release_sock(sk);
> return put_user(answ, (int __user *)arg);
> + break;
> +
> + case SIOPNPIPE_ENABLE:
> + if (sk->sk_state == TCP_SYN_SENT)
> + return -EBUSY;
> + else if (sk->sk_state == TCP_ESTABLISHED)
> + return -EISCONN;
> + else
> + return pep_sock_enable(sk, NULL, 0);
> + break;
> }
I strongly suspect insufficient locking here.
>
> return -ENOIOCTLCMD;
> @@ -959,6 +1021,18 @@ static int pep_setsockopt(struct sock *sk, int level,
> int optname, }
> goto out_norel;
>
> + case PNPIPE_HANDLE:
> + if ((sk->sk_state == TCP_CLOSE) &&
> + (val >= 0) && (val < PN_PIPE_INVALID_HANDLE))
> + pn->pipe_handle = val;
> + else
> + err = -EINVAL;
> + break;
Same problem regarding pipe_handle as above.
> +
> + case PNPIPE_INITSTATE:
> + pn->init_enable = !!val;
> + break;
> +
> default:
> err = -ENOPROTOOPT;
> }
> @@ -994,6 +1068,17 @@ static int pep_getsockopt(struct sock *sk, int level,
> int optname, return -EINVAL;
> break;
>
> + case PNPIPE_ENABLE:
> + if (sk->sk_state == TCP_ESTABLISHED)
> + val = 1;
> + else
> + val = 0;
> + break;
Do you still need this read-only option?
> +
> + case PNPIPE_INITSTATE:
> + val = pn->init_enable;
> + break;
> +
> default:
> return -ENOPROTOOPT;
> }
--
Rémi Denis-Courmont
http://www.remlab.net/
^ permalink raw reply
* Re: [PATCH RESUBMIT2 net-next 1/2] IPv6 routing, NLM_F_* flag support: warn if NEWROUTE lacks of both CREATE and REPLACE flags
From: Matti Vaittinen @ 2011-11-14 9:37 UTC (permalink / raw)
To: ext David Miller; +Cc: netdev
In-Reply-To: <20111114.033608.2000639077648429389.davem@davemloft.net>
On Mon, 2011-11-14 at 03:36 -0500, ext David Miller wrote:
> From: Matti Vaittinen <matti.vaittinen@nsn.com>
> Date: Mon, 14 Nov 2011 10:31:20 +0200
>
> > Messages can be changed, that's no problem. I would like to hear suggestions
> > because I personally see not much problems in current messages.
>
> You don't even say that the route request is for an ipv6 route.
I'll add the IPv6 in prints, and also look through the ŕest of the prints
in 2/2 patch && see if I can think of something to improve.
Also, this patch (1/2) has wrong subject and explanation - total brainfart
from my side, sorry. The check for NLM_F_REPLACE is not in this, but in
the 2/2 patch. I'll correct subject and explanation too.
--
Matti Vaittinen
+358 504863070
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Told a UDP joke the other night...
...but I'm not sure everyone got it...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^ permalink raw reply
* Re: [v2 PATCH 1/2] NETFILTER module xt_hmark new target for HASH based fw
From: Hans Schillstrom @ 2011-11-14 9:19 UTC (permalink / raw)
To: Pablo Neira Ayuso
Cc: Hans Schillstrom, kaber, jengelh, netfilter-devel, netdev
In-Reply-To: <20111113170528.GB16851@1984>
On Sunday, November 13, 2011 18:05:28 Pablo Neira Ayuso wrote:
> BTW, I think you should split xt_HMARK to ipt_HMARK and ip6t_HMARK
> (see recent Florian Westphal patches regarding reserve lookup for
> instance).
>
> The IPv4 and IPv6 parts for HMARK look so different that I don't think
> it makes sense to keep them into one single xt_HMARK thing with all
> those conditional ifdefs for IPV6.
>
Ok I'll do that, for some reason a thought it was better with one module.
--
Hans
^ permalink raw reply
* Re: [PATCH] net, bridge: print log message after state changed
From: Fritz, Wolfgang @ 2011-11-14 9:10 UTC (permalink / raw)
To: David Miller; +Cc: netdev, shemminger, bridge, Brunck, Holger
In-Reply-To: <20111114.020749.1176229013272872343.davem@davemloft.net>
[-- Attachment #1.1: Type: text/plain, Size: 3241 bytes --]
Am Montag, den 14.11.2011, 02:07 -0500 schrieb David Miller:
> From: "Fritz, Wolfgang" <Wolfgang.Fritz@keymile.com>
> Date: Mon, 14 Nov 2011 08:03:31 +0100
>
> >
> > Am Montag, den 14.11.2011, 00:37 -0500 schrieb David Miller:
> >> From: Holger Brunck <holger.brunck@keymile.com>
> >> Date: Thu, 10 Nov 2011 17:18:54 +0100
> >>
> >> > From: Wolfgang Fritz <wolfgang.fritz@keymile.com>
> >> >
> >> > Function br_log_state writes log message "... entering XXXX state" so it
> >> > should be called after the state has changed and not before.
> >> >
> >> > Signed-off-by: Wolfgang Fritz <wolfgang.fritz@keymile.com>
> >> > Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
> >>
> >> "entering" can roughly mean "about to enter"
> >>
> >
> > Exactly.
> >
> >> The message is therefore appropriately timed as far as I'm concerned.
> >>
> >
> > It's not.
> >
> > Please test: create a bridge with STP disabled, add an interface to the
> > bridge and set that interface down. You get the message "... entering
> > forwarding state". That's wrong because it's "about to enter" disabled
> > state some lines later.
> >
> > All other (4) calls to br_log_state are located after the state change,
> > see for example br_stp_enable_port() just some lines above the patch.
>
> I would never expect an "entering" message to print out after the event
> happens, and in fact I'd _always_ want to see it beforehand so that if
> the state change caused a crash or similar it'd be that much easier
> to pinpoint the proper location.
>
OK. I see what you mean.
Proposal 1 (simple):
- Modify log message to "Entered xxx state"
- Apply patch
- Do not touch other br_log_message calls.
Proposal 2 (more complicated):
Add parameter "newstate" to br_log_message and put br_log_message before
the state change. Print log message only if new state != current state.
Advantage: Reduces log spam when STP is disabled. After the last
modifications to the bridge code which force "forwarding" state if STP
is disabled, you get lots of bogus messages "entering forwarding state":
Event old message new message
link up learning forwarding
forwarding delay expired forwarding forwarding
interface down disabled forwarding
link down disabled forwarding
If you accept one of the proposals, I'll prepare a patch. Otherwise, we
keep my original patch in our local tree.
> I'm still not applying this. If the other log messages behave
> differently, they are broken, so go fix those instead.
Regards,
Wolfgang
--
"I love deadlines. I like the whooshing sound they make as they fly by"
(Douglas Adams)
=======================================================================
KEYMILE GmbH mailto:wolfgang.fritz@keymile.com
Wolfgang Fritz Phone: +49 (0)511 6747-692
Wohlenbergstr. 3 Fax: +49 (0)511 6747-777
D-30179 Hannover http://www.keymile.com
Managing Directors: Björn Claaßen, Dipl.-Kfm. Andreas Gebauer
Legal structure: GmbH, Registered office: Hanover, HRB 61069
Local court Hanover, VAT-Reg.-No.: DE 812282795,
WEEE-Reg.-No.: DE 59336750
[-- Attachment #1.2: Type: text/html, Size: 5051 bytes --]
[-- Attachment #2: Type: text/plain, Size: 159 bytes --]
_______________________________________________
Bridge mailing list
Bridge@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bridge
^ permalink raw reply
* Re: [PATCH net-next] bnx2x: add endline at end of message
From: Dmitry Kravkov @ 2011-11-14 8:44 UTC (permalink / raw)
To: davem@davemloft.net; +Cc: netdev@vger.kernel.org, Joe Perches
In-Reply-To: <1321260039-19254-1-git-send-email-dmitry@broadcom.com>
git send-email ignored <Reported-by> tag, adding Joe
On Mon, 2011-11-14 at 00:40 -0800, Dmitry Kravkov wrote:
> Reported-by: Joe Perches <joe@perches.com>
> Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
> ---
> drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> index 33ff60d..c09e59a 100644
> --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> @@ -8520,7 +8520,7 @@ sp_rtnl_not_reset:
> * damage
> */
> if (test_and_clear_bit(BNX2X_SP_RTNL_FAN_FAILURE, &bp->sp_rtnl_state)) {
> - DP(BNX2X_MSG_SP, "fan failure detected. Unloading driver");
> + DP(BNX2X_MSG_SP, "fan failure detected. Unloading driver\n");
> netif_device_detach(bp->dev);
> bnx2x_close(bp->dev);
> }
^ permalink raw reply
* [PATCH net-next] bnx2x: add endline at end of message
From: Dmitry Kravkov @ 2011-11-14 8:40 UTC (permalink / raw)
To: davem, netdev; +Cc: Dmitry Kravkov
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 33ff60d..c09e59a 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -8520,7 +8520,7 @@ sp_rtnl_not_reset:
* damage
*/
if (test_and_clear_bit(BNX2X_SP_RTNL_FAN_FAILURE, &bp->sp_rtnl_state)) {
- DP(BNX2X_MSG_SP, "fan failure detected. Unloading driver");
+ DP(BNX2X_MSG_SP, "fan failure detected. Unloading driver\n");
netif_device_detach(bp->dev);
bnx2x_close(bp->dev);
}
--
1.7.7.2
^ permalink raw reply related
* Re: [PATCH v3 net-next 09/12] bnx2x: add fan failure event handling
From: David Miller @ 2011-11-14 8:37 UTC (permalink / raw)
To: dmitry; +Cc: joe, netdev, ariele, eilong
In-Reply-To: <1321259286.4651.9.camel@lb-tlvb-dmitry>
Come on :-/
Please make a real patch submission instead of hijacking the posting
for the original patch.
The subject line here is going to be completely wrong if I just
apply this thing as-is.
^ permalink raw reply
* Re: [PATCH RESUBMIT2 net-next 1/2] IPv6 routing, NLM_F_* flag support: warn if NEWROUTE lacks of both CREATE and REPLACE flags
From: David Miller @ 2011-11-14 8:36 UTC (permalink / raw)
To: matti.vaittinen; +Cc: netdev
In-Reply-To: <1321259480.23935.75.camel@hakki>
From: Matti Vaittinen <matti.vaittinen@nsn.com>
Date: Mon, 14 Nov 2011 10:31:20 +0200
> Messages can be changed, that's no problem. I would like to hear suggestions
> because I personally see not much problems in current messages.
You don't even say that the route request is for an ipv6 route.
How in the world can we track down the application if you don't
even say what protocol the route request is for?
^ permalink raw reply
* Re: [PATCH v3 net-next 09/12] bnx2x: add fan failure event handling
From: Dmitry Kravkov @ 2011-11-14 8:28 UTC (permalink / raw)
To: Joe Perches, davem, netdev; +Cc: Ariel Elior, Eilon Greenstein
In-Reply-To: <80A6ED01A151614884EE6C5D6AF0F2E9099806AF42@SJEXCHCCR01.corp.ad.broadcom.com>
> -----Original Message-----
> From: Joe Perches [mailto:joe@perches.com]
> Sent: Sunday, November 13, 2011 11:12 PM
> To: Dmitry Kravkov
> Cc: davem@davemloft.net; netdev@vger.kernel.org; Ariel Elior; Eilon Greenstein
> Subject: Re: [PATCH v3 net-next 09/12] bnx2x: add fan failure event handling
>
> On Sun, 2011-11-13 at 16:34 +0200, Dmitry Kravkov wrote:
> > From: Ariel Elior <ariele@broadcom.com>
> > Shut down the device in case of fan failure to prevent HW damage.
> trivia.
> > diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
> []
> > @@ -8503,6 +8514,17 @@ sp_rtnl_not_reset:
> []
> > + if (test_and_clear_bit(BNX2X_SP_RTNL_FAN_FAILURE, &bp->sp_rtnl_state)) {
> > + DP(BNX2X_MSG_SP, "fan failure detected. Unloading driver");
>
> Missing a trailing "\n".
>
Reported-by: Joe Perches <joe@perches.com>
Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com>
---
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
index 33ff60d..c09e59a 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
@@ -8520,7 +8520,7 @@ sp_rtnl_not_reset:
* damage
*/
if (test_and_clear_bit(BNX2X_SP_RTNL_FAN_FAILURE, &bp->sp_rtnl_state)) {
- DP(BNX2X_MSG_SP, "fan failure detected. Unloading driver");
+ DP(BNX2X_MSG_SP, "fan failure detected. Unloading driver\n");
netif_device_detach(bp->dev);
bnx2x_close(bp->dev);
}
--
1.7.7.2
^ permalink raw reply related
* Re: [PATCH RESUBMIT2 net-next 1/2] IPv6 routing, NLM_F_* flag support: warn if NEWROUTE lacks of both CREATE and REPLACE flags
From: Matti Vaittinen @ 2011-11-14 8:31 UTC (permalink / raw)
To: ext David Miller; +Cc: netdev
In-Reply-To: <20111114.020617.136865443034405914.davem@davemloft.net>
On Mon, 2011-11-14 at 02:06 -0500, ext David Miller wrote:
> These changes still has problems.
>
> These warning messages are poorly formatted. Some of them don't even indicate
> that ipv6 is involved by printing out an appropriate subsystem prefix.
>
> And frankly, they are ugly, and not the kind of thing I want the networking
> code spitting out.
>
> Consolidate these messages into something that has good form and will indicate,
> consistently, where the trouble is occuring.
Messages can be changed, that's no problem. I would like to hear suggestions
because I personally see not much problems in current messages.
(Of course printing out the subsystem is obvious improvement now that you
mentioned it).
It is hard for me to think something better that would still be short
warning telling what's wrong.
> And I still submit that perhaps these ugly messages are an indicate of how
> bad an idea these changes are in the first place. We've lived with the
> current behavior for 15 years or more, nobody cared. What changed?
Need for IPv6 is what changed for me. I've lived past 15 years purely with IPv4.
I expect this change to be true for more users in the future - although I
cant say I expect IPv6 to be be suddenly utilized everywhere.
> What can't you do with the current setup? And why can't you do it without
> requiring operations that work currently to change semantics?
Short answers: I cant just easily replace a route. And I cannot maintain
old way if I want to respect netlink specifications.
Longer story:
Why I started writing this patch is that I had unpleasant surprizes when I
tried using netlink for IPv6 route manipulations first time. I noticed that
IPv6 does not use netlink API as user (I) could expect.
1. Replacing a route surprised me. I did not expect that request with no
CREATE flag would create a new route. That's what happens. Even with
NLM_F_REPLACE
2. Also it was not possible to replace a route in single netlink message.
What got me even more puzzled is, that no warning, no error, no nothing was
returned to me when I set NLM_F_REPLACE flag. I would have expected -ENOTSUP
if there's no support. Or -EINVAL. I had no idea that flag was not supported.
I addmitt it was propably stupid to expect IPv6 to work in same way as IPv4
does. However I used netlink for IPv4 too, so I assumed whatI tried with IPv6
was correct. Finally I had to look at the kernel sources to find out what
happens. I believe that prints (even ugly ones) are better than making
averaǵe Joe like me to search for the answers from kernel sources.
Anyways, what really improves usability is support for REPLACE flag.
Instead of issuing REPLACE request user now needs to:
1. issue GET request to see if there is routes to given destination
2. explisitly delete the matching route
3. create new route and
4. keep on eye the changes done by other processes.
That is whole a lot more work for user, and that requires route manipulation
tools to do different handling for IPv4 and IPv6 routes.
I believe respecting all flags according to netlink specifications would be
what new users expect. Thats why my patch also handles NLM_F_CREATE and
NLM_F_EXCL flags.
However, I think that in usability point of view the support for
NLM_F_REPLACE would be sufficient. Rest is more to fullfill the
specification.
[siedenote: And if we further ponder the NLM_F_* flags that would
improve usability, then NLM_F_MATCH with GET requests is what I would
like. (Returning only routes matching values given in request).
However that's not really supported by IPv4 either, so not having it in IPv6
is not really a priority.]
Anyways, this is the discussion I hoped for before I wrote the patches.
And in my first post to this list I asked if the support for these flags
was left out in purpose. So please let me know, if you think these changes
are unnecessary and wont be included in any case. Then I can just give up
polluting your list with mails and patches, and stop wasting your and my
time. However if you think these changes are worthy, then no problem,
I can do required fixups to these patches. I just would like to know
what changes are acceptable / wanted.
--
Matti Vaittinen
+358 504863070
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Told a UDP joke the other night...
...but I'm not sure everyone got it...
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
^ permalink raw reply
* [PATCH] ipv4: fix a memory leak in ic_bootp_send_if
From: roy.qing.li @ 2011-11-14 7:54 UTC (permalink / raw)
To: netdev
From: RongQing.Li <roy.qing.li@gmail.com>
when dev_hard_header() failed, the newly allocated skb should be freed.
Signed-off-by: RongQing.Li <roy.qing.li@gmail.com>
---
net/ipv4/ipconfig.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 0da2afc..7f17ba8 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -822,8 +822,13 @@ static void __init ic_bootp_send_if(struct ic_device *d, unsigned long jiffies_d
skb->dev = dev;
skb->protocol = htons(ETH_P_IP);
if (dev_hard_header(skb, dev, ntohs(skb->protocol),
- dev->broadcast, dev->dev_addr, skb->len) < 0 ||
- dev_queue_xmit(skb) < 0)
+ dev->broadcast, dev->dev_addr, skb->len) < 0) {
+ kfree_skb(skb);
+ printk("E");
+ return;
+ }
+
+ if (dev_queue_xmit(skb) < 0)
printk("E");
}
--
1.7.1
^ permalink raw reply related
* [PATCH] Phonet: set the pipe handle using setsockopt
From: Hemant Vilas RAMDASI @ 2011-11-14 7:53 UTC (permalink / raw)
To: netdev-owner
Cc: netdev, remi.denis-courmont, Dinesh Kumar Sharma, Hemant Ramdasi
From: Dinesh Kumar Sharma <dinesh.sharma@stericsson.com>
This provides flexibility to set the pipe handle
using setsockopt. The pipe can be enabled (if disabled) later
using ioctl.
Signed-off-by: Hemant Ramdasi <hemant.ramdasi@stericsson.com>
Signed-off-by: Dinesh Kumar Sharma <dinesh.sharma@stericsson.com>
---
include/linux/phonet.h | 3 +
net/phonet/pep.c | 105 +++++++++++++++++++++++++++++++++++++++++++-----
2 files changed, 98 insertions(+), 10 deletions(-)
diff --git a/include/linux/phonet.h b/include/linux/phonet.h
index 6fb1384..4c00551 100644
--- a/include/linux/phonet.h
+++ b/include/linux/phonet.h
@@ -37,6 +37,8 @@
#define PNPIPE_ENCAP 1
#define PNPIPE_IFINDEX 2
#define PNPIPE_HANDLE 3
+#define PNPIPE_ENABLE 4
+#define PNPIPE_INITSTATE 5
#define PNADDR_ANY 0
#define PNADDR_BROADCAST 0xFC
@@ -180,6 +182,7 @@ static inline __u8 pn_sockaddr_get_resource(const struct sockaddr_pn *spn)
/* Phonet device ioctl requests */
#ifdef __KERNEL__
#define SIOCPNGAUTOCONF (SIOCDEVPRIVATE + 0)
+#define SIOPNPIPE_ENABLE _IO(SIOCPNGAUTOCONF, 1)
struct if_phonet_autoconf {
uint8_t device;
diff --git a/net/phonet/pep.c b/net/phonet/pep.c
index f17fd84..48339b9 100644
--- a/net/phonet/pep.c
+++ b/net/phonet/pep.c
@@ -533,6 +533,29 @@ static int pep_connresp_rcv(struct sock *sk, struct sk_buff *skb)
return pipe_handler_send_created_ind(sk);
}
+static int pep_enableresp_rcv(struct sock *sk, struct sk_buff *skb)
+{
+ struct pnpipehdr *hdr = pnp_hdr(skb);
+
+ if (hdr->error_code != PN_PIPE_NO_ERROR)
+ return -ECONNREFUSED;
+
+ return pep_indicate(sk, PNS_PIPE_ENABLED_IND, 0 /* sub-blocks */,
+ NULL, 0, GFP_ATOMIC);
+
+}
+
+static void pipe_start_flow_control(struct sock *sk)
+{
+ struct pep_sock *pn = pep_sk(sk);
+
+ if (!pn_flow_safe(pn->tx_fc)) {
+ atomic_set(&pn->tx_credits, 1);
+ sk->sk_write_space(sk);
+ }
+ pipe_grant_credits(sk, GFP_ATOMIC);
+}
+
/* Queue an skb to an actively connected sock.
* Socket lock must be held. */
static int pipe_handler_do_rcv(struct sock *sk, struct sk_buff *skb)
@@ -578,13 +601,25 @@ static int pipe_handler_do_rcv(struct sock *sk, struct sk_buff *skb)
sk->sk_state = TCP_CLOSE_WAIT;
break;
}
+ if (pn->init_enable == PN_PIPE_DISABLE)
+ sk->sk_state = TCP_SYN_RECV;
+ else {
+ sk->sk_state = TCP_ESTABLISHED;
+ pipe_start_flow_control(sk);
+ }
+ break;
- sk->sk_state = TCP_ESTABLISHED;
- if (!pn_flow_safe(pn->tx_fc)) {
- atomic_set(&pn->tx_credits, 1);
- sk->sk_write_space(sk);
+ case PNS_PEP_ENABLE_RESP:
+ if (sk->sk_state != TCP_SYN_SENT)
+ break;
+
+ if (pep_enableresp_rcv(sk, skb)) {
+ sk->sk_state = TCP_CLOSE_WAIT;
+ break;
}
- pipe_grant_credits(sk, GFP_ATOMIC);
+
+ sk->sk_state = TCP_ESTABLISHED;
+ pipe_start_flow_control(sk);
break;
case PNS_PEP_DISCONNECT_RESP:
@@ -863,14 +898,31 @@ static int pep_sock_connect(struct sock *sk, struct sockaddr *addr, int len)
int err;
u8 data[4] = { 0 /* sub-blocks */, PAD, PAD, PAD };
- pn->pipe_handle = 1; /* anything but INVALID_HANDLE */
+ if (pn->pipe_handle == PN_PIPE_INVALID_HANDLE)
+ pn->pipe_handle = 1; /* anything but INVALID_HANDLE */
+
err = pipe_handler_request(sk, PNS_PEP_CONNECT_REQ,
- PN_PIPE_ENABLE, data, 4);
- if (err) {
- pn->pipe_handle = PN_PIPE_INVALID_HANDLE;
+ pn->init_enable, data, 4);
+ if (err)
return err;
- }
+
+ sk->sk_state = TCP_SYN_SENT;
+
+ return 0;
+}
+
+static int pep_sock_enable(struct sock *sk, struct sockaddr *addr, int len)
+{
+ int err;
+
+ err = pipe_handler_request(sk, PNS_PEP_ENABLE_REQ, PAD,
+ NULL, 0);
+
+ if (err)
+ return err;
+
sk->sk_state = TCP_SYN_SENT;
+
return 0;
}
@@ -894,6 +946,16 @@ static int pep_ioctl(struct sock *sk, int cmd, unsigned long arg)
answ = 0;
release_sock(sk);
return put_user(answ, (int __user *)arg);
+ break;
+
+ case SIOPNPIPE_ENABLE:
+ if (sk->sk_state == TCP_SYN_SENT)
+ return -EBUSY;
+ else if (sk->sk_state == TCP_ESTABLISHED)
+ return -EISCONN;
+ else
+ return pep_sock_enable(sk, NULL, 0);
+ break;
}
return -ENOIOCTLCMD;
@@ -959,6 +1021,18 @@ static int pep_setsockopt(struct sock *sk, int level, int optname,
}
goto out_norel;
+ case PNPIPE_HANDLE:
+ if ((sk->sk_state == TCP_CLOSE) &&
+ (val >= 0) && (val < PN_PIPE_INVALID_HANDLE))
+ pn->pipe_handle = val;
+ else
+ err = -EINVAL;
+ break;
+
+ case PNPIPE_INITSTATE:
+ pn->init_enable = !!val;
+ break;
+
default:
err = -ENOPROTOOPT;
}
@@ -994,6 +1068,17 @@ static int pep_getsockopt(struct sock *sk, int level, int optname,
return -EINVAL;
break;
+ case PNPIPE_ENABLE:
+ if (sk->sk_state == TCP_ESTABLISHED)
+ val = 1;
+ else
+ val = 0;
+ break;
+
+ case PNPIPE_INITSTATE:
+ val = pn->init_enable;
+ break;
+
default:
return -ENOPROTOOPT;
}
--
1.7.4.3
^ permalink raw reply related
* Re: [PATCH] net, bridge: print log message after state changed
From: David Miller @ 2011-11-14 7:07 UTC (permalink / raw)
To: Wolfgang.Fritz; +Cc: netdev, shemminger, bridge, holger.brunck
In-Reply-To: <1321254211.4072.29.camel@pc005091.de.keymile.net>
From: "Fritz, Wolfgang" <Wolfgang.Fritz@keymile.com>
Date: Mon, 14 Nov 2011 08:03:31 +0100
>
> Am Montag, den 14.11.2011, 00:37 -0500 schrieb David Miller:
>> From: Holger Brunck <holger.brunck@keymile.com>
>> Date: Thu, 10 Nov 2011 17:18:54 +0100
>>
>> > From: Wolfgang Fritz <wolfgang.fritz@keymile.com>
>> >
>> > Function br_log_state writes log message "... entering XXXX state" so it
>> > should be called after the state has changed and not before.
>> >
>> > Signed-off-by: Wolfgang Fritz <wolfgang.fritz@keymile.com>
>> > Signed-off-by: Holger Brunck <holger.brunck@keymile.com>
>>
>> "entering" can roughly mean "about to enter"
>>
>
> Exactly.
>
>> The message is therefore appropriately timed as far as I'm concerned.
>>
>
> It's not.
>
> Please test: create a bridge with STP disabled, add an interface to the
> bridge and set that interface down. You get the message "... entering
> forwarding state". That's wrong because it's "about to enter" disabled
> state some lines later.
>
> All other (4) calls to br_log_state are located after the state change,
> see for example br_stp_enable_port() just some lines above the patch.
I would never expect an "entering" message to print out after the event
happens, and in fact I'd _always_ want to see it beforehand so that if
the state change caused a crash or similar it'd be that much easier
to pinpoint the proper location.
I'm still not applying this. If the other log messages behave
differently, they are broken, so go fix those instead.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox