Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
From: Simon Horman @ 2013-10-16  2:13 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
	netdev, Julian Anastasov, Mark Brooks
In-Reply-To: <1381883949.2045.97.camel@edumazet-glaptop.roam.corp.google.com>

On Tue, Oct 15, 2013 at 05:39:09PM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-16 at 09:28 +0900, Simon Horman wrote:
> 
> > > I guess things like NFQUEUE could happen ?
> > 
> > Could you expand a little?
> 
> This was to point that between IPVS and ipv6 stack we might have a
> delay, and daddr was maybe pointed to a freed memory.
> 
> IP6CB only uses 24 bytes, so I think you would be safe adding 16 bytes.

That does seem very promising but while implementing it
I hit a problem.

struct tcp_skb_cb includes a field of type struct inet6_skb_parm.  And
expanding struct inet6_skb_parm by 16 bytes means that struct tcp_skb_cb is
now larger than 48 bytes and no longer fits in skb->cb.

Is it appropriate to grow skb->cb as the comment above struct tcp_skb_cb
suggests?

^ permalink raw reply

* Re: [PATCH V2] For for each TSN t being newly acked (Not only cumulatively, but also SELECTIVELY) cacc_saw_newack should be set to 1.
From: Vlad Yasevich @ 2013-10-16  2:13 UTC (permalink / raw)
  To: Chang Xiangzhong; +Cc: nhorman, davem, linux-sctp, netdev, linux-kernel
In-Reply-To: <1381860784-16481-1-git-send-email-changxiangzhong@gmail.com>

On 10/15/2013 02:13 PM, Chang Xiangzhong wrote:
> Signed-off-by: Xiangzhong Chang <changxiangzhong@gmail.com>

Your proposed solution is very nice, but it does 2 things in one
patch.
  1) It fixes the bug
  2) It refactors the code to improve the flow.

While (2) is very nice, it needs a much more careful review.
Can you please split this into 2 patches?  First patch can
make the code look like this:

	if (sctp_acked(sack, tsn)) {
		...
		if (!tchunk->tsn_gap_acked) {
			tchunk->tsn_gap_acked = 1;
			*highest_new_tsn_in_sack = tsn;
			bytes_acked += sctp_data_size(tchunk);
                         if (!tchunk->transport)
				migrate_bytes += sctp_data_size(tchunk);
			forward_progress = true;

			/*
			 * SFR-CACC algorithm:
			 * 2) If the SACK contains gap acks
			 * and the flag CHANGEOVER_ACTIVE is
			 * set the receiver of the SACK MUST
			 * take the following action:
			...
		}
	}

Then you can file a second patch to improve the flow/refactor the 
function.  You have to be very careful here though and be sure to
run through all the regression tests since you would be modifying
a very critcal part of the code.

Thanks
-vlad

> ---
>   net/sctp/outqueue.c |   76 ++++++++++++++++++++++++---------------------------
>   1 file changed, 35 insertions(+), 41 deletions(-)
>
> diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
> index 94df758..84ef3b8 100644
> --- a/net/sctp/outqueue.c
> +++ b/net/sctp/outqueue.c
> @@ -1357,13 +1357,13 @@ static void sctp_check_transmitted(struct sctp_outq *q,
>
>   		tsn = ntohl(tchunk->subh.data_hdr->tsn);
>   		if (sctp_acked(sack, tsn)) {
> -			/* If this queue is the retransmit queue, the
> -			 * retransmit timer has already reclaimed
> -			 * the outstanding bytes for this chunk, so only
> -			 * count bytes associated with a transport.
> -			 */
> -			if (transport) {
> -				/* If this chunk is being used for RTT
> +			if (!tchunk->tsn_gap_acked) {
> +				/* If this queue is the retransmit queue, the
> +				 * retransmit timer has already reclaimed
> +				 * the outstanding bytes for this chunk, so only
> +				 * count bytes associated with a transport.
> +				 *
> +				 * If this chunk is being used for RTT
>   				 * measurement, calculate the RTT and update
>   				 * the RTO using this value.
>   				 *
> @@ -1374,28 +1374,44 @@ static void sctp_check_transmitted(struct sctp_outq *q,
>   				 * first instance of the packet or a later
>   				 * instance).
>   				 */
> -				if (!tchunk->tsn_gap_acked &&
> -				    tchunk->rtt_in_progress) {
> +				if (transport && tchunk->rtt_in_progress) {
>   					tchunk->rtt_in_progress = 0;
>   					rtt = jiffies - tchunk->sent_at;
>   					sctp_transport_update_rto(transport,
> -								  rtt);
> +						rtt);
>   				}
> -			}
>
> -			/* If the chunk hasn't been marked as ACKED,
> -			 * mark it and account bytes_acked if the
> -			 * chunk had a valid transport (it will not
> -			 * have a transport if ASCONF had deleted it
> -			 * while DATA was outstanding).
> -			 */
> -			if (!tchunk->tsn_gap_acked) {
> +				/* If the chunk hasn't been marked as ACKED,
> +				 * mark it and account bytes_acked if the
> +				 * chunk had a valid transport (it will not
> +				 * have a transport if ASCONF had deleted it
> +				 * while DATA was outstanding).
> +				 */
>   				tchunk->tsn_gap_acked = 1;
>   				*highest_new_tsn_in_sack = tsn;
>   				bytes_acked += sctp_data_size(tchunk);
>   				if (!tchunk->transport)
>   					migrate_bytes += sctp_data_size(tchunk);
>   				forward_progress = true;
> +
> +				/* SFR-CACC algorithm:
> +				 * 2) If the SACK contains gap acks
> +				 * and the flag CHANGEOVER_ACTIVE is
> +				 * set the receiver of the SACK MUST
> +				 * take the following action:
> +				 *
> +				 * B) For each TSN t being acked that
> +				 * has not been acked in any SACK so
> +				 * far, set cacc_saw_newack to 1 for
> +				 * the destination that the TSN was
> +				 * sent to.
> +				 */
> +				if (transport &&
> +					sack->num_gap_ack_blocks &&
> +					q->asoc->peer.primary_path->cacc.
> +					changeover_active
> +				   )
> +					transport->cacc.cacc_saw_newack = 1;
>   			}
>
>   			if (TSN_lte(tsn, sack_ctsn)) {
> @@ -1411,30 +1427,8 @@ static void sctp_check_transmitted(struct sctp_outq *q,
>   				restart_timer = 1;
>   				forward_progress = true;
>
> -				if (!tchunk->tsn_gap_acked) {
> -					/*
> -					 * SFR-CACC algorithm:
> -					 * 2) If the SACK contains gap acks
> -					 * and the flag CHANGEOVER_ACTIVE is
> -					 * set the receiver of the SACK MUST
> -					 * take the following action:
> -					 *
> -					 * B) For each TSN t being acked that
> -					 * has not been acked in any SACK so
> -					 * far, set cacc_saw_newack to 1 for
> -					 * the destination that the TSN was
> -					 * sent to.
> -					 */
> -					if (transport &&
> -					    sack->num_gap_ack_blocks &&
> -					    q->asoc->peer.primary_path->cacc.
> -					    changeover_active)
> -						transport->cacc.cacc_saw_newack
> -							= 1;
> -				}
> -
>   				list_add_tail(&tchunk->transmitted_list,
> -					      &q->sacked);
> +					&q->sacked);
>   			} else {
>   				/* RFC2960 7.2.4, sctpimpguide-05 2.8.2
>   				 * M2) Each time a SACK arrives reporting
>

^ permalink raw reply

* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
From: Simon Horman @ 2013-10-16  2:14 UTC (permalink / raw)
  To: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
	netdev, Julian Anastasov, Mark Brooks
In-Reply-To: <20131016005322.GA18135@order.stressinduktion.org>

On Wed, Oct 16, 2013 at 02:53:22AM +0200, Hannes Frederic Sowa wrote:
> On Wed, Oct 16, 2013 at 09:02:31AM +0900, Simon Horman wrote:
> > In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
> > ip6_finish_output2()") changed the behaviour of ip6_finish_output2()
> > such that it creates and uses a neigh entry if none is found.
> > Subsequently the 'n' field was removed from struct rt6_info.
> > 
> > Unfortunately my analysis is that in the case of IPVS direct routing this
> > change leads to incorrect behaviour as in this case packets may be output
> > to a destination other than where they would be output according to the
> > route table. In particular, the destination address may actually be a local
> > address and empirically a neighbour lookup seems to result in it becoming
> > unreachable.
> > 
> > This patch resolves the problem by providing the destination address
> > determined by IPVS to ip6_finish_output2() in the skb callback.  Although
> > this seems to work I can see several problems with this approach:
> > 
> > * It is rather ugly, stuffing an IPVS exception right in
> >   the middle of IPv6 code. The overhead could be eliminated for many users
> >   by using a staic key. But none the less it is not attractive.
> > 
> > * The use of the skb callback is may not be valid
> >   as it crosses from IPVS to IPv6 code. A possible, though unpleasant,
> >   alternative is to add a new field to struct sk_buff.
> > 
> > * This covers all IPv6 packets output by IPVS but actually
> >   only those output using IPVS Direct-Routing need this.  One way to
> >   resolve this would be to add a more fine-grained ipvs_property to
> >   struct sk_buff.
> 
> Hmm, that reminds me on the following bug report which would be nice we could
> solve in one go, too: http://www.spinics.net/lists/netdev/msg250785.html

I think it should be possible to solve that using the
IP6CB() approach that Eric suggested. Hopefully we can make
that approach fly.

^ permalink raw reply

* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
From: Eric Dumazet @ 2013-10-16  2:40 UTC (permalink / raw)
  To: Simon Horman
  Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
	netdev, Julian Anastasov, Mark Brooks
In-Reply-To: <20131016021304.GA17801@verge.net.au>

On Wed, 2013-10-16 at 11:13 +0900, Simon Horman wrote:

> That does seem very promising but while implementing it
> I hit a problem.
> 
> struct tcp_skb_cb includes a field of type struct inet6_skb_parm.  And
> expanding struct inet6_skb_parm by 16 bytes means that struct tcp_skb_cb is
> now larger than 48 bytes and no longer fits in skb->cb.
> 
> Is it appropriate to grow skb->cb as the comment above struct tcp_skb_cb
> suggests?

Ah well...

No, its not appropriate to grow sk_buff by 16 bytes, sorry.

You could not change struct inet6_skb_parm, but define IP6CB as
a compound 

struct ip6cb {
   struct inet6_skb_parm foo;
   struct in6_addr       bar;
};




^ permalink raw reply

* Re: [PATCH 02/18] net: use wrapper functions of net_ratelimit() to simplify code
From: Kefeng Wang @ 2013-10-16  3:24 UTC (permalink / raw)
  To: Joe Perches
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Greg Kroah-Hartman,
	David S. Miller, Pablo Neira Ayuso, Stephen Hemminger,
	Johannes Berg, John W. Linville, Stanislaw Gruszka, Johannes Berg,
	Francois Romieu, Ben Hutchings, Chas Williams, Marc Kleine-Budde,
	Samuel Ortiz, Paul Mackerras, Oliver Neukum,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	Rusty Russell, Michael S. Tsirkin, netfilter
In-Reply-To: <1381854263.22110.19.camel@joe-AO722>

Thanks for your reply.

On 10/16 0:24, Joe Perches wrote:
> On Tue, 2013-10-15 at 19:44 +0800, Kefeng Wang wrote:
>> Wrapper functions net_ratelimited_function() and net_XXX_ratelimited()
>> are called to simplify code.
> []
>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> []
>> @@ -465,10 +465,8 @@ void br_fdb_update(struct net_bridge *br, struct net_bridge_port *source,
>>  	if (likely(fdb)) {
>>  		/* attempt to update an entry for a local interface */
>>  		if (unlikely(fdb->is_local)) {
>> -			if (net_ratelimit())
>> -				br_warn(br, "received packet on %s with "
>> -					"own address as source address\n",
>> -					source->dev->name);
>> +			net_ratelimited_function(br_warn, br, "received packet on %s "
>> +				"with own address as source address\n", source->dev->name);
> 
> Hello Kefeng.
> 
> When these types of lines are changed, please coalesce the
> fragmented format pieces into a single string.
> 
> It makes grep a bit easier and 80 columns limits don't
> apply to formats.

Got it, I will coalesce them, but 80 columns limits will be
broken.

> I think using net_ratelimited_function is not particularly
> clarifying here.
> 
> Maybe net_ratelimited_function should be removed instead
> of its use sites expanded.
> 
> Perhaps adding macros like #define br_warn_ratelimited()
> would be better.

yes, I found dev_emerg_ratelimited already exists. I should
use them and will add some similar mcaros.

> This comment applies to the whole series.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH 14/18] net: usb: use wrapper functions of net_ratelimit() to simplify code
From: Kefeng Wang @ 2013-10-16  3:27 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: linux-kernel, Greg Kroah-Hartman, David S. Miller,
	Pablo Neira Ayuso, Stephen Hemminger, Johannes Berg,
	John W. Linville, Stanislaw Gruszka, Johannes Berg,
	Francois Romieu, Ben Hutchings, Chas Williams, Marc Kleine-Budde,
	Samuel Ortiz, Paul Mackerras, Oliver Neukum,
	Konrad Rzeszutek Wilk, Boris Ostrovsky, David Vrabel,
	Rusty Russell, Michael S. Tsirkin, netfilter
In-Reply-To: <525D921C.2030709@cogentembedded.com>

Thanks for you reply.
On 10/16 3:06, Sergei Shtylyov wrote:
> Hello.
> 
> On 10/15/2013 03:45 PM, Kefeng Wang wrote:
> 
>> net_ratelimited_function() is called to simplify code.
> 
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
>>   drivers/net/usb/usbnet.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
>> diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
>> index bf94e10..edf81de 100644
>> --- a/drivers/net/usb/usbnet.c
>> +++ b/drivers/net/usb/usbnet.c
>> @@ -450,8 +450,8 @@ void usbnet_defer_kevent (struct usbnet *dev, int work)
>>   {
>>       set_bit (work, &dev->flags);
>>       if (!schedule_work (&dev->kevent)) {
>> -        if (net_ratelimit())
>> -            netdev_err(dev->net, "kevent %d may have been dropped\n", work);
>> +        net_ratelimited_function(netdev_err, dev->net,
>> +            "kevent %d may have been dropped\n", work);
> 
>    The continuation line should start under 'netdev_err'. Same about the other patches where you didn't change the indentation of the continuation lines though you should have.

Got it, indentation will be changed.

> WBR, Sergei
> 
> 
> .
> 

^ permalink raw reply

* Re: [PATCH v2.43 0/5] MPLS actions and matches
From: Ben Pfaff @ 2013-10-16  3:55 UTC (permalink / raw)
  To: Simon Horman
  Cc: dev, netdev, Jesse Gross, Pravin B Shelar, Ravi K, Isaku Yamahata,
	Joe Stringer
In-Reply-To: <20131010003139.GA20311@verge.net.au>

On Thu, Oct 10, 2013 at 09:31:41AM +0900, Simon Horman wrote:
> I believe this series addresses the feedback that each of you
> gave with regards to recent previous postings of this patch-set.
> I'm wondering if you could find some time to review it.

I'm waiting for an ack from Jesse, then I'm going to do a final pass and
I hope to commit this series at this point.  I see some ways that we can
improve MPLS support afterward but I don't see any blockers.

^ permalink raw reply

* Re: [PATCH v2.43 0/5] MPLS actions and matches
From: Ben Pfaff @ 2013-10-16  3:56 UTC (permalink / raw)
  To: Simon Horman
  Cc: dev, netdev, Jesse Gross, Pravin B Shelar, Ravi K, Isaku Yamahata,
	Joe Stringer
In-Reply-To: <20131016035530.GA29165@nicira.com>

On Tue, Oct 15, 2013 at 08:55:30PM -0700, Ben Pfaff wrote:
> On Thu, Oct 10, 2013 at 09:31:41AM +0900, Simon Horman wrote:
> > I believe this series addresses the feedback that each of you
> > gave with regards to recent previous postings of this patch-set.
> > I'm wondering if you could find some time to review it.
> 
> I'm waiting for an ack from Jesse, then I'm going to do a final pass and
> I hope to commit this series at this point.  I see some ways that we can
> improve MPLS support afterward but I don't see any blockers.

(Yet.  Somehow MPLS seems to require fractal review.)

^ permalink raw reply

* Re: [PATCH v2.43 0/5] MPLS actions and matches
From: Simon Horman @ 2013-10-16  4:23 UTC (permalink / raw)
  To: Ben Pfaff
  Cc: dev, netdev, Jesse Gross, Pravin B Shelar, Ravi K, Isaku Yamahata,
	Joe Stringer
In-Reply-To: <20131016035600.GB29165@nicira.com>

On Tue, Oct 15, 2013 at 08:56:00PM -0700, Ben Pfaff wrote:
> On Tue, Oct 15, 2013 at 08:55:30PM -0700, Ben Pfaff wrote:
> > On Thu, Oct 10, 2013 at 09:31:41AM +0900, Simon Horman wrote:
> > > I believe this series addresses the feedback that each of you
> > > gave with regards to recent previous postings of this patch-set.
> > > I'm wondering if you could find some time to review it.
> > 
> > I'm waiting for an ack from Jesse, then I'm going to do a final pass and
> > I hope to commit this series at this point.  I see some ways that we can
> > improve MPLS support afterward but I don't see any blockers.
> 
> (Yet.  Somehow MPLS seems to require fractal review.)

If you notice something please let me know :)

^ permalink raw reply

* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
From: Simon Horman @ 2013-10-16  4:26 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
	netdev, Julian Anastasov, Mark Brooks
In-Reply-To: <1381891259.2045.100.camel@edumazet-glaptop.roam.corp.google.com>

On Tue, Oct 15, 2013 at 07:40:59PM -0700, Eric Dumazet wrote:
> On Wed, 2013-10-16 at 11:13 +0900, Simon Horman wrote:
> 
> > That does seem very promising but while implementing it
> > I hit a problem.
> > 
> > struct tcp_skb_cb includes a field of type struct inet6_skb_parm.  And
> > expanding struct inet6_skb_parm by 16 bytes means that struct tcp_skb_cb is
> > now larger than 48 bytes and no longer fits in skb->cb.
> > 
> > Is it appropriate to grow skb->cb as the comment above struct tcp_skb_cb
> > suggests?
> 
> Ah well...
> 
> No, its not appropriate to grow sk_buff by 16 bytes, sorry.
> 
> You could not change struct inet6_skb_parm, but define IP6CB as
> a compound 
> 
> struct ip6cb {
>    struct inet6_skb_parm foo;
>    struct in6_addr       bar;
> };

Thanks for your guidance, that possibility had occured to me too.
I'll rework the patch accordingly.

^ permalink raw reply

* Re: [PATCH] net: sh_eth: Fix RX packets errors on R8A7740
From: Guennadi Liakhovetski @ 2013-10-16  5:37 UTC (permalink / raw)
  To: Sergei Shtylyov
  Cc: Nguyen Hong Ky, David S. Miller, netdev, Ryusuke Sakato,
	Simon Horman
In-Reply-To: <525DB81C.6010106@cogentembedded.com>

On Wed, 16 Oct 2013, Sergei Shtylyov wrote:

> Hello.
> 
> On 10/15/2013 11:28 AM, Guennadi Liakhovetski wrote:
> 
> > > > This patch will fix RX packets errors when receiving big size
> > > > of data by set bit RNC = 1.
> 
> > > > RNC - Receive Enable Control
> 
> > > > 0: Upon completion of reception of one frame, the E-DMAC writes
> > > > the receive status to the descriptor and clears the RR bit in
> > > > EDRRR to 0.
> 
> > > > 1: Upon completion of reception of one frame, the E-DMAC writes
> > > > (writes back) the receive status to the descriptor. In addition,
> > > > the E-DMAC reads the next descriptor and prepares for reception
> > > > of the next frame.
> 
> > > > In addition, for get more stable when receiving packets, I set
> > > > maximum size for the transmit/receive FIFO and inserts padding
> > > > in receive data.
> 
> > > > Signed-off-by: Nguyen Hong Ky <nh-ky@jinso.co.jp>
> > > > ---
> > > >    drivers/net/ethernet/renesas/sh_eth.c |    4 ++++
> > > >    1 files changed, 4 insertions(+), 0 deletions(-)
> 
> > > > diff --git a/drivers/net/ethernet/renesas/sh_eth.c
> > > > b/drivers/net/ethernet/renesas/sh_eth.c
> > > > index a753928..11d34f0 100644
> > > > --- a/drivers/net/ethernet/renesas/sh_eth.c
> > > > +++ b/drivers/net/ethernet/renesas/sh_eth.c
> > > > @@ -649,12 +649,16 @@ static struct sh_eth_cpu_data r8a7740_data = {
> > > >    	.eesr_err_check	= EESR_TWB1 | EESR_TWB | EESR_TABT | EESR_RABT
> > > > |
> > > >    			  EESR_RFE | EESR_RDE | EESR_RFRMER | EESR_TFE
> > > > |
> > > >    			  EESR_TDE | EESR_ECI,
> > > > +	.fdr_value	= 0x0000070f,
> > > > +	.rmcr_value	= 0x00000001,
> > > > 
> > > >    	.apr		= 1,
> > > >    	.mpr		= 1,
> > > >    	.tpauser	= 1,
> > > >    	.bculr		= 1,
> > > >    	.hw_swap	= 1,
> > > > +	.rpadir		= 1,
> > > > +	.rpadir_value   = 2 << 16,
> > > >    	.no_trimd	= 1,
> > > >    	.no_ade		= 1,
> > > >    	.tsu		= 1,
> 
> > >     Guennadi, could you check if this patch fixes your issue with NFS.
> > > Make
> > > sure it applies to 'r8a7740_data' (it was misapplied to DaveM's tree).
> 
> > Yes, the current -next, which includes this patch (in a slightly different
> > form) boots fine over NFS for me.
> 
>    I don't know what you mean by "slightly different form" exactly. Also, I
> was unable to locate the fresh -next tree.

git://gitorious.org/thierryreding/linux-next.git

> 'net-next.git' contains this patch
> in a mismerged form, 'net.git' has Simon's patch that corrects this mismerge.
> 
> > Thanks
> > Guennadi
> 
> WBR, Sergei
> 

---
Guennadi Liakhovetski, Ph.D.
Freelance Open-Source Software Developer
http://www.open-technology.de/

^ permalink raw reply

* [PATCH net-next] {selinux, af_key} Rework pfkey_sadb2xfrm_user_sec_ctx
From: Fan Du @ 2013-10-16  6:15 UTC (permalink / raw)
  To: steffen.klassert; +Cc: davem, netdev

Taking advantages of sadb_x_sec_ctx and xfrm_user_sec_ctx share the same
structure arrangement, rework pfkey_sadb2xfrm_user_sec_ctx by casting
sadb_x_sec_ctx into xfrm_user_sec_ctx with minor len fix.

Then we can:
 -Avoid kmalloc/free memory for xfrm_user_sec_ctx, sadb_x_sec_ctx would be fine.
 -Fix missing return value check bug for pfkey_compile_policy when kmalloc fails

Signed-off-by: Fan Du <fan.du@windriver.com>
---
 net/key/af_key.c |   33 +--------------------------------
 1 file changed, 1 insertion(+), 32 deletions(-)

diff --git a/net/key/af_key.c b/net/key/af_key.c
index 9d58537..c7d304d 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -435,22 +435,9 @@ static inline int verify_sec_ctx_len(const void *p)
 
 static inline struct xfrm_user_sec_ctx *pfkey_sadb2xfrm_user_sec_ctx(const struct sadb_x_sec_ctx *sec_ctx)
 {
-	struct xfrm_user_sec_ctx *uctx = NULL;
-	int ctx_size = sec_ctx->sadb_x_ctx_len;
-
-	uctx = kmalloc((sizeof(*uctx)+ctx_size), GFP_KERNEL);
-
-	if (!uctx)
-		return NULL;
+	struct xfrm_user_sec_ctx *uctx = (struct xfrm_user_sec_ctx *)sec_ctx;
 
 	uctx->len = pfkey_sec_ctx_len(sec_ctx);
-	uctx->exttype = sec_ctx->sadb_x_sec_exttype;
-	uctx->ctx_doi = sec_ctx->sadb_x_ctx_doi;
-	uctx->ctx_alg = sec_ctx->sadb_x_ctx_alg;
-	uctx->ctx_len = sec_ctx->sadb_x_ctx_len;
-	memcpy(uctx + 1, sec_ctx + 1,
-	       uctx->ctx_len);
-
 	return uctx;
 }
 
@@ -1125,12 +1112,7 @@ static struct xfrm_state * pfkey_msg2xfrm_state(struct net *net,
 	if (sec_ctx != NULL) {
 		struct xfrm_user_sec_ctx *uctx = pfkey_sadb2xfrm_user_sec_ctx(sec_ctx);
 
-		if (!uctx)
-			goto out;
-
 		err = security_xfrm_state_alloc(x, uctx);
-		kfree(uctx);
-
 		if (err)
 			goto out;
 	}
@@ -2225,14 +2207,7 @@ static int pfkey_spdadd(struct sock *sk, struct sk_buff *skb, const struct sadb_
 	if (sec_ctx != NULL) {
 		struct xfrm_user_sec_ctx *uctx = pfkey_sadb2xfrm_user_sec_ctx(sec_ctx);
 
-		if (!uctx) {
-			err = -ENOBUFS;
-			goto out;
-		}
-
 		err = security_xfrm_policy_alloc(&xp->security, uctx);
-		kfree(uctx);
-
 		if (err)
 			goto out;
 	}
@@ -2329,11 +2304,7 @@ static int pfkey_spddelete(struct sock *sk, struct sk_buff *skb, const struct sa
 	if (sec_ctx != NULL) {
 		struct xfrm_user_sec_ctx *uctx = pfkey_sadb2xfrm_user_sec_ctx(sec_ctx);
 
-		if (!uctx)
-			return -ENOMEM;
-
 		err = security_xfrm_policy_alloc(&pol_ctx, uctx);
-		kfree(uctx);
 		if (err)
 			return err;
 	}
@@ -3230,8 +3201,6 @@ static struct xfrm_policy *pfkey_compile_policy(struct sock *sk, int opt,
 			goto out;
 		uctx = pfkey_sadb2xfrm_user_sec_ctx(sec_ctx);
 		*dir = security_xfrm_policy_alloc(&xp->security, uctx);
-		kfree(uctx);
-
 		if (*dir)
 			goto out;
 	}
-- 
1.7.9.5

^ permalink raw reply related

* 答复: Re: DomU's network interface will hung when Dom0 running 32bit
From: Annie Li @ 2013-10-16  6:42 UTC (permalink / raw)
  To: Ian.Campbell; +Cc: netdev, xen-devel, jianhai.luan, wei.liu2


On Tue, 2013-10-15 at 10:44 +0800, jianhai luan wrote:
> On 2013-10-14 19:19, Wei Liu wrote:
> > On Sat, Oct 12, 2013 at 04:53:18PM +0800, jianhai luan wrote:
> >> Hi Ian,
> >>    I meet the DomU's network interface hung issue recently, and have
> >> been working on the issue from that time. I find that DomU's network
> >> interface, which send lesser package, will hung if Dom0 running
> >> 32bit and DomU's up-time is very long.  I think that one jiffies
> >> overflow bug exist in the function tx_credit_exceeded().
> >>    I know the inline function time_after_eq(a,b) will process jiffies
> >> overflow, but the function have one limit a should little that (b +
> >> MAX_SIGNAL_LONG). If a large than the value, time_after_eq will
> >> return false. The MAX_SINGNAL_LONG should be 0x7fffffff at 32-bit
> >> machine.
> >>    If DomU's network interface send lesser package (<0.5k/s if
> >> jiffies=250 and credit_bytes=ULONG_MAX), jiffies will beyond out
> >> (credit_timeout.expires + MAX_SIGNAL_LONG) and time_after_eq(now,
> >> next_credit) will failure (should be true). So one timer which will
> >> not be trigger in short time, and later process will be aborted when
> >> timer_pending(&vif->credit_timeout) is true. The result will be
> >> DomU's network interface will be hung in long time (> 40days).
> >>    Please think about the below scenario:
> >>    Condition:
> >>      Dom0 running 32-bit and HZ = 1000
> >>      vif->credit_timeout->expire = 0xffffffff, vif->remaining_credit
> >> = 0xffffffff, vif->credit_usec=0 jiffies=0
> >>      vif receive lesser package (DomU send lesser package). If the
> >> value is litter than 2K/s, consume 4G(0xffffffff) will need 582.55
> >> hours. jiffies will large than 0x7ffffff. we guess jiffies =
> >> 0x800000ff, time_after_eq(0x800000ff, 0xffffffff) will failure, and
> >> one time which expire is 0xfffffff will be pended into system. So
> >> the interface will hung until jiffies recount 0xffffffff (that will
> >> need very long time).
> > If I'm not mistaken you meant time_after_eq(now, next_credit) in
> > netback. How does next_credit become 0xffffffff?
> 
> I only assume the value is 0xfffffff, and the value of next_credit 
> isn't  point. If the delta between now and next_credit larger than 
> ULONG_MAX, time_after_eq will do wrong judge.

So it sounds like we need a timer which is independent of the traffic
being sent to keep credit_timeout.expires rolling over.

Is it a timer to be set as less than ULONG_MAX/2 to avoid credit_timeout.expires rolling over? But the problem is that we can not assure where jiffies start from, and this probably results into current issue again.
I assume Jason's patch fix this issue and this patch only uses __mod_timer to add a timer with next_credit when the netback fails to send out current available credits.

Thanks
Annie

> >
> > Wei.
> >
> >>    If some error exist in above explain, please help me point it out.
> >>
> >> Thanks,
> >> Jason
> 



^ permalink raw reply

* Re: [PATCH] X.25: Fix address field length calculation
From: Kelleter, Günther @ 2013-10-16  6:58 UTC (permalink / raw)
  To: Joe Perches
  Cc: andrew.hendry@gmail.com, davem@davemloft.net,
	linux-x25@vger.kernel.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org
In-Reply-To: <1381858190.22110.25.camel@joe-AO722>

Am 15.10.2013 19:29, schrieb Joe Perches:
> On Tue, 2013-10-15 at 14:29 +0000, Kelleter, Günther wrote:
>> Addresses are BCD encoded, not ASCII. x25_addr_ntoa got it right.
> []
>> Wrong length calculation leads to rejection of CALL ACCEPT packets.
> []
>> diff --git a/net/x25/af_x25.c b/net/x25/af_x25.c
> []
>> @@ -98,7 +98,7 @@ int x25_parse_address_block(struct sk_buff *skb,
>>  	}
>>   	len = *skb->data;
>> -	needed = 1 + (len >> 4) + (len & 0x0f);
>> +	needed = 1 + ((len >> 4) + (len & 0x0f) + 1) / 2;
> This calculation looks odd.
> Perhaps use bcd.h instead?
>

It's just the same calculation as in x25_add_ntoa (last line) and it's
used this way by x.25.
Two digits are encoded to one byte and the last byte is padded with 0 if
the total number of digits is odd.--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply

* Re: DomU's network interface will hung when Dom0 running 32bit
From: annie li @ 2013-10-16  7:10 UTC (permalink / raw)
  To: Ian Campbell; +Cc: jianhai luan, Wei Liu, xen-devel, netdev
In-Reply-To: <1381826609.24708.135.camel@kazak.uk.xensource.com>


On 2013-10-15 16:43, Ian Campbell wrote:
> On Tue, 2013-10-15 at 10:44 +0800, jianhai luan wrote:
>> On 2013-10-14 19:19, Wei Liu wrote:
>>> On Sat, Oct 12, 2013 at 04:53:18PM +0800, jianhai luan wrote:
>>>> Hi Ian,
>>>>     I meet the DomU's network interface hung issue recently, and have
>>>> been working on the issue from that time. I find that DomU's network
>>>> interface, which send lesser package, will hung if Dom0 running
>>>> 32bit and DomU's up-time is very long.  I think that one jiffies
>>>> overflow bug exist in the function tx_credit_exceeded().
>>>>     I know the inline function time_after_eq(a,b) will process jiffies
>>>> overflow, but the function have one limit a should little that (b +
>>>> MAX_SIGNAL_LONG). If a large than the value, time_after_eq will
>>>> return false. The MAX_SINGNAL_LONG should be 0x7fffffff at 32-bit
>>>> machine.
>>>>     If DomU's network interface send lesser package (<0.5k/s if
>>>> jiffies=250 and credit_bytes=ULONG_MAX), jiffies will beyond out
>>>> (credit_timeout.expires + MAX_SIGNAL_LONG) and time_after_eq(now,
>>>> next_credit) will failure (should be true). So one timer which will
>>>> not be trigger in short time, and later process will be aborted when
>>>> timer_pending(&vif->credit_timeout) is true. The result will be
>>>> DomU's network interface will be hung in long time (> 40days).
>>>>     Please think about the below scenario:
>>>>     Condition:
>>>>       Dom0 running 32-bit and HZ = 1000
>>>>       vif->credit_timeout->expire = 0xffffffff, vif->remaining_credit
>>>> = 0xffffffff, vif->credit_usec=0 jiffies=0
>>>>       vif receive lesser package (DomU send lesser package). If the
>>>> value is litter than 2K/s, consume 4G(0xffffffff) will need 582.55
>>>> hours. jiffies will large than 0x7ffffff. we guess jiffies =
>>>> 0x800000ff, time_after_eq(0x800000ff, 0xffffffff) will failure, and
>>>> one time which expire is 0xfffffff will be pended into system. So
>>>> the interface will hung until jiffies recount 0xffffffff (that will
>>>> need very long time).
>>> If I'm not mistaken you meant time_after_eq(now, next_credit) in
>>> netback. How does next_credit become 0xffffffff?
>> I only assume the value is 0xfffffff, and the value of next_credit
>> isn't  point. If the delta between now and next_credit larger than
>> ULONG_MAX, time_after_eq will do wrong judge.
> So it sounds like we need a timer which is independent of the traffic
> being sent to keep credit_timeout.expires rolling over.
Resending it because of the wrong format in last email...

Is it a timer to be set as less than ULONG_MAX/2 to avoid credit_timeout.expires rolling over? But the problem is that we can not assure where jiffies start from, and this probably results into current issue again.
I assume Jason's patch fix this issue and this patch only uses __mod_timer to add a timer with next_credit when the netback fails to send out current
available credits.

Thanks
Annie


>
> Can you propose a patch?
>
> Ian.
>
>>> Wei.
>>>
>>>>     If some error exist in above explain, please help me point it out.
>>>>
>>>> Thanks,
>>>> Jason
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH v3] net: sctp: fix a cacc_saw_newack missetting issue
From: Chang Xiangzhong @ 2013-10-16  7:22 UTC (permalink / raw)
  To: vyasevich
  Cc: nhorman, davem, linux-sctp, netdev, linux-kernel,
	Chang Xiangzhong

For for each TSN t being newly acked (Not only cumulatively,
but also SELECTIVELY) cacc_saw_newack should be set to 1.

Signed-off-by: Xiangzhong Chang <changxiangzhong@gmail.com>
---
 net/sctp/outqueue.c |   53 ++++++++++++++++++++++++---------------------------
 1 file changed, 25 insertions(+), 28 deletions(-)

diff --git a/net/sctp/outqueue.c b/net/sctp/outqueue.c
index 94df758..da9a778 100644
--- a/net/sctp/outqueue.c
+++ b/net/sctp/outqueue.c
@@ -1383,19 +1383,38 @@ static void sctp_check_transmitted(struct sctp_outq *q,
 				}
 			}
 
-			/* If the chunk hasn't been marked as ACKED,
-			 * mark it and account bytes_acked if the
-			 * chunk had a valid transport (it will not
-			 * have a transport if ASCONF had deleted it
-			 * while DATA was outstanding).
-			 */
 			if (!tchunk->tsn_gap_acked) {
+				/* If the chunk hasn't been marked as ACKED,
+				 * mark it and account bytes_acked if the
+				 * chunk had a valid transport (it will not
+				 * have a transport if ASCONF had deleted it
+				 * while DATA was outstanding).
+				 */
 				tchunk->tsn_gap_acked = 1;
 				*highest_new_tsn_in_sack = tsn;
 				bytes_acked += sctp_data_size(tchunk);
 				if (!tchunk->transport)
 					migrate_bytes += sctp_data_size(tchunk);
 				forward_progress = true;
+
+				/*
+				 * SFR-CACC algorithm:
+				 * 2) If the SACK contains gap acks
+				 * and the flag CHANGEOVER_ACTIVE is
+				 * set the receiver of the SACK MUST
+				 * take the following action:
+				 *
+				 * B) For each TSN t being acked that
+				 * has not been acked in any SACK so
+				 * far, set cacc_saw_newack to 1 for
+				 * the destination that the TSN was
+				 * sent to.
+				 */
+				if (transport &&
+					sack->num_gap_ack_blocks &&
+					q->asoc->peer.primary_path->cacc.
+					changeover_active)
+					transport->cacc.cacc_saw_newack = 1;
 			}
 
 			if (TSN_lte(tsn, sack_ctsn)) {
@@ -1411,28 +1430,6 @@ static void sctp_check_transmitted(struct sctp_outq *q,
 				restart_timer = 1;
 				forward_progress = true;
 
-				if (!tchunk->tsn_gap_acked) {
-					/*
-					 * SFR-CACC algorithm:
-					 * 2) If the SACK contains gap acks
-					 * and the flag CHANGEOVER_ACTIVE is
-					 * set the receiver of the SACK MUST
-					 * take the following action:
-					 *
-					 * B) For each TSN t being acked that
-					 * has not been acked in any SACK so
-					 * far, set cacc_saw_newack to 1 for
-					 * the destination that the TSN was
-					 * sent to.
-					 */
-					if (transport &&
-					    sack->num_gap_ack_blocks &&
-					    q->asoc->peer.primary_path->cacc.
-					    changeover_active)
-						transport->cacc.cacc_saw_newack
-							= 1;
-				}
-
 				list_add_tail(&tchunk->transmitted_list,
 					      &q->sacked);
 			} else {
-- 
1.7.9.5

^ permalink raw reply related

* Re: [RFC net-next] ipv6: Use destination address determined by IPVS
From: Julian Anastasov @ 2013-10-16  7:27 UTC (permalink / raw)
  To: Simon Horman
  Cc: YOSHIFUJI Hideaki / 吉藤英明, lvs-devel,
	netdev, Mark Brooks
In-Reply-To: <1381881751-6719-1-git-send-email-horms@verge.net.au>


	Hello,

On Wed, 16 Oct 2013, Simon Horman wrote:

> In v3.9 6fd6ce2056de2709 ("ipv6: Do not depend on rt->n in
> ip6_finish_output2()") changed the behaviour of ip6_finish_output2()
> such that it creates and uses a neigh entry if none is found.
> Subsequently the 'n' field was removed from struct rt6_info.

	Similar change in IPv4 opened the Pandora box:
IPVS, xt_TEE, raw.c (IP_HDRINCL). May be the corrsponding
places in IPv6 have the same problem.

	I don't know the IPv6 routing but if we find a way
to keep the desired nexthop in rt6i_gateway and to add
RTF_GATEWAY checks here and there such solution would be more
general. FLOWI_FLAG_KNOWN_NH flag can help, if needed.

Regards

--
Julian Anastasov <ja@ssi.bg>

^ permalink raw reply

* Re: DomU's network interface will hung when Dom0 running 32bit
From: jianhai luan @ 2013-10-16  7:35 UTC (permalink / raw)
  To: Wei Liu; +Cc: Ian Campbell, xen-devel, netdev, ANNIE LI
In-Reply-To: <20131015125802.GR11739@zion.uk.xensource.com>

[-- Attachment #1: Type: text/plain, Size: 1755 bytes --]


On 2013-10-15 20:58, Wei Liu wrote:
> On Tue, Oct 15, 2013 at 07:26:31PM +0800, jianhai luan wrote:
> [...]
>>>>> Can you propose a patch?
>>>> Because credit_timeout.expire always after jiffies, i judge the
>>>> value over the range of time_after_eq() by time_before(now,
>>>> vif->credit_timeout.expires). please check the patch.
>>> I don't think this really fix the issue for you. You still have chance
>>> that now wraps around and falls between expires and next_credit. In that
>>> case it's stalled again.
>> if time_before(now, vif->credit_timeout.expires) is true, time wrap
>> and do operation. Otherwise time_before(now,
>> vif->credit_timeout.expires) isn't true, now -
>> vif->credit_timeout.expires should be letter than ULONG_MAX/2.
>> Because next_credit large than vif->credit_timeout.expires
>> (next_crdit = vif->credit_timeout.expires +
>> msecs_to_jiffies(vif->credit_usec/1000)), the delta between now and
>> next_credit should be in range of time_after_eq().  So
>> time_after_eq() do correctly judge.
>>
> Not sure I understand you. Consider "now" is placed like this:
>
>     expires   now   next_credit
>     ----time increases this direction--->
>
> * time_after_eq(now, next_credit) -> false
> * time_before(now, expires) -> false
>
> Then it's stuck again. You're merely narrowing the window, not fixing
> the real problem.

The above environment isn't stack again. The netback will pending one 
timer to process the environment.

The attachment program will prove if !(time_after_eq(now, next_credit) 
|| time_before(now, vif->credit_timeout.expires)), now will only be 
placed in  above environment [ expires next_credit),  and the above 
environment will be processed by timer in soon.
>
> Wei.
>
>> Jason
>>> Wei.
Jason.

[-- Attachment #2: main.c --]
[-- Type: text/plain, Size: 961 bytes --]

#include <stdio.h>

#define typecheck(type,x) \
({	type __dummy; \
	typeof(x) __dummy2; \
	(void)(&__dummy == &__dummy2); \
	1; \
})

#define time_after(a, b)		\
	(typecheck(unsigned char, a) && \
	 typecheck(unsigned char, b) && \
	 ((char)((b) - (a)) < 0))
#define time_before(a,b)	time_after(b,a)

#define time_after_eq(a,b)		\
	(typecheck(unsigned char, a) && \
	 typecheck(unsigned char, b) && \
	 ((char)((a) -(b)) >= 0))
#define time_before_eq(a, b) time_after_eq(b,a)

void do_nothing()
{
	return;
}

int main()
{
	unsigned char expire, now, next;
	unsigned char delta = 10;
	int i, j;

	for(i = 0; i < 256; i++) {
		expire = i;
		next = expire + delta;

		printf("\n\n\n[%u ... %u]\n", expire, next);
		now = expire;
		for(j=0; j < 1024; j++, now++) {	
			if(j%256 == 0) printf("\n");

			if (time_after_eq(now, next) ||
				time_before(now, expire)) {
				do_nothing();
			}
			else {
				printf("    now=%d\n", (char)now);
			}
		}
	}
	
	return 0;
}

^ permalink raw reply

* Re: [PATCH v4 net 0/3] sctp: Use software checksum under certain
From: Daniel Borkmann @ 2013-10-16  8:02 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: netdev, linux-sctp, fan.du
In-Reply-To: <1381888891-31186-1-git-send-email-vyasevich@gmail.com>

On 10/16/2013 04:01 AM, Vlad Yasevich wrote:
> There are some cards that support SCTP checksum offloading.  When using
> these cards with IPSec or forcing IP fragmentation of SCTP traffic,
> the checksum is computed incorrectly due to the fact that xfrm and IP/IPv6
> fragmentation code do not know that this is SCTP traffic and do not
> know that checksum has to be computed differently.
>
> To fix this, we let SCTP detect these conditions and perform software
> checksum calculation.
>
> Fan Du (1):
>    sctp: Use software crc32 checksum when xfrm transform will happen.
>
> Vlad Yasevich (2):
>    net: dst: provide accessor function to dst->xfrm
>    sctp: Perform software checksum if packet has to be fragmented.
>
>   include/net/dst.h | 12 ++++++++++++
>   net/sctp/output.c |  3 ++-
>   2 files changed, 14 insertions(+), 1 deletion(-)
>

Acked-by: Daniel Borkmann <dborkman@redhat.com>

^ permalink raw reply

* [PATCH v2 net 4/4] bridge: Fix updating FDB entries when the PVID is applied
From: Toshiaki Makita @ 2013-10-16  8:07 UTC (permalink / raw)
  To: David S . Miller, Vlad Yasevich, netdev; +Cc: Toshiaki Makita, Toshiaki Makita
In-Reply-To: <1381910836-718-1-git-send-email-makita.toshiaki@lab.ntt.co.jp>

We currently set the value that variable vid is pointing, which will be
used in FDB later, to 0 at br_allowed_ingress() when we receive untagged
or priority-tagged frames, even though the PVID is valid.
This leads to FDB updates in such a wrong way that they are learned with
VID 0.
Update the value to that of PVID if the PVID is applied.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Reviewed-by: Vlad Yasevich <vyasevic@redhat.com>
---
 net/bridge/br_vlan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 5a9c44a..53f0990 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -217,6 +217,7 @@ bool br_allowed_ingress(struct net_bridge *br, struct net_port_vlans *v,
 		/* PVID is set on this port.  Any untagged or priority-tagged
 		 * ingress frame is considered to belong to this vlan.
 		 */
+		*vid = pvid;
 		if (likely(err))
 			/* Untagged Frame. */
 			__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), pvid);
-- 
1.8.1.2

^ permalink raw reply related

* [PATCH v2 net 0/4] bridge: Fix problems around the PVID
From: Toshiaki Makita @ 2013-10-16  8:07 UTC (permalink / raw)
  To: David S . Miller, Vlad Yasevich, netdev; +Cc: Toshiaki Makita, Toshiaki Makita

There seem to be some undesirable behaviors related with PVID.
1. It has no effect assigning PVID to a port. PVID cannot be applied
to any frame regardless of whether we set it or not.
2. FDB entries learned via frames applied PVID are registered with
VID 0 rather than VID value of PVID.
3. We can set 0 or 4095 as a PVID that are not allowed in IEEE 802.1Q.
This leads interoperational problems such as sending frames with VID
4095, which is not allowed in IEEE 802.1Q, and treating frames with VID
0 as they belong to VLAN 0, which is expected to be handled as they have
no VID according to IEEE 802.1Q.

Note: 2nd and 3rd problems are potential and not exposed unless 1st problem
is fixed, because we cannot activate PVID due to it.

This is my analysis for each behavior.
1. We are using VLAN_TAG_PRESENT bit when getting PVID, and not when
adding/deleting PVID.
It can be fixed in either way using or not using VLAN_TAG_PRESENT,
but I think the latter is slightly more efficient.

2. We are setting skb->vlan_tci with the value of PVID but the variable
vid, which is used in FDB later, is set to 0 at br_allowed_ingress()
when untagged frames arrive at a port with PVID valid. I'm afraid that
vid should be updated to the value of PVID if PVID is valid.

3. According to IEEE 802.1Q-2011 (6.9.1 and Table 9-2), we cannot use
VID 0 or 4095 as a PVID.
It looks like that there are more stuff to consider.

- VID 0:
VID 0 shall not be configured in any FDB entry and used in a tag header
to indicate it is a 802.1p priority-tagged frame.
Priority-tagged frames should be applied PVID (from IEEE 802.1Q 6.9.1).
In my opinion, since we can filter incomming priority-tagged frames by
deleting PVID, we don't need to filter them by vlan_bitmap.
In other words, priority-tagged frames don't have VID 0 but have no VID,
which is the same as untagged frames, and should be filtered by unsetting
PVID.
So, not only we cannot set PVID as 0, but also we don't need to add 0 to
vlan_bitmap, which enables us to simply forbid to add vlan 0.

- VID 4095:
VID 4095 shall not be transmitted in a tag header. This VID value may be
used to indicate a wildcard match for the VID in management operations or
FDB entries (from IEEE 802.1Q Table 9-2).
In current implementation, we can create a static FDB entry with all
existing VIDs by not specifying any VID when creating it.
I don't think this way to add wildcard-like entries needs to change,
and VID 4095 looks no use and can be unacceptable to add.

Consequently, I believe what we should do for 3rd problem is below:
- Not allowing VID 0 and 4095 to be added.
- Applying PVID to priority-tagged (VID 0) frames.

Note: It has been descovered that another problem related to priority-tags
remains. If we use vlan 0 interface such as eth0.0, we cannot communicate
with another end station via a linux bridge.
This problem exists regardless of whether this patch set is applied or not
because we might receive untagged frames from another end station even if we
are sending priority-tagged frames.
This issue will be addressed by another patch set introducing an additional
egress policy, on which Vlad Yasevich is working.
See http://marc.info/?t=137880893800001&r=1&w=2 for detailed discussion.

Patch set follows this mail.
The order of patches is not the same as described above, because the way
to fix 1st problem is based on the assumption that we don't use VID 0 as
a PVID, which is realized by fixing 3rd problem.
(1/4)(2/4): Fix 3rd problem.
(3/4): Fix 1st problem.
(4/4): Fix 2nd probelm.

v2:
- Add descriptions about the problem related to priority-tags in cover letter.
- Revise patch comments to reference the newest spec.

Toshiaki Makita (4):
  bridge: Don't use VID 0 and 4095 in vlan filtering
  bridge: Apply the PVID to priority-tagged frames
  bridge: Fix the way the PVID is referenced
  bridge: Fix updating FDB entries when the PVID is applied

 net/bridge/br_fdb.c     |   4 +-
 net/bridge/br_netlink.c |   2 +-
 net/bridge/br_private.h |   4 +-
 net/bridge/br_vlan.c    | 125 ++++++++++++++++++++++++++----------------------
 4 files changed, 71 insertions(+), 64 deletions(-)

-- 
1.8.1.2

^ permalink raw reply

* [PATCH v2 net 3/4] bridge: Fix the way the PVID is referenced
From: Toshiaki Makita @ 2013-10-16  8:07 UTC (permalink / raw)
  To: David S . Miller, Vlad Yasevich, netdev; +Cc: Toshiaki Makita, Toshiaki Makita
In-Reply-To: <1381910836-718-1-git-send-email-makita.toshiaki@lab.ntt.co.jp>

We are using the VLAN_TAG_PRESENT bit to detect whether the PVID is
set or not at br_get_pvid(), while we don't care about the bit in
adding/deleting the PVID, which makes it impossible to forward any
incomming untagged frame with vlan_filtering enabled.

Since vid 0 cannot be used for the PVID, we can use vid 0 to indicate
that the PVID is not set, which is slightly more efficient than using
the VLAN_TAG_PRESENT.

Fix the problem by getting rid of using the VLAN_TAG_PRESENT.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Reviewed-by: Vlad Yasevich <vyasevic@redhat.com>
---
 net/bridge/br_private.h | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index efb57d9..7ca2ae4 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -643,9 +643,7 @@ static inline u16 br_get_pvid(const struct net_port_vlans *v)
 	 * vid wasn't set
 	 */
 	smp_rmb();
-	return (v->pvid & VLAN_TAG_PRESENT) ?
-			(v->pvid & ~VLAN_TAG_PRESENT) :
-			VLAN_N_VID;
+	return v->pvid ?: VLAN_N_VID;
 }

 #else
-- 
1.8.1.2

^ permalink raw reply related

* [PATCH v2 net 1/4] bridge: Don't use VID 0 and 4095 in vlan filtering
From: Toshiaki Makita @ 2013-10-16  8:07 UTC (permalink / raw)
  To: David S . Miller, Vlad Yasevich, netdev; +Cc: Toshiaki Makita, Toshiaki Makita
In-Reply-To: <1381910836-718-1-git-send-email-makita.toshiaki@lab.ntt.co.jp>

IEEE 802.1Q says that:
- VID 0 shall not be configured as a PVID, or configured in any Filtering
Database entry.
- VID 4095 shall not be configured as a PVID, or transmitted in a tag
header. This VID value may be used to indicate a wildcard match for the VID
in management operations or Filtering Database entries.
(See IEEE 802.1Q-2011 6.9.1 and Table 9-2)

Don't accept adding these VIDs in the vlan_filtering implementation.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Reviewed-by: Vlad Yasevich <vyasevic@redhat.com>
---
 net/bridge/br_fdb.c     |  4 +-
 net/bridge/br_netlink.c |  2 +-
 net/bridge/br_vlan.c    | 97 +++++++++++++++++++++++--------------------------
 3 files changed, 49 insertions(+), 54 deletions(-)

diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
index ffd5874..33e8f23 100644
--- a/net/bridge/br_fdb.c
+++ b/net/bridge/br_fdb.c
@@ -700,7 +700,7 @@ int br_fdb_add(struct ndmsg *ndm, struct nlattr *tb[],
 
 		vid = nla_get_u16(tb[NDA_VLAN]);
 
-		if (vid >= VLAN_N_VID) {
+		if (!vid || vid >= VLAN_VID_MASK) {
 			pr_info("bridge: RTM_NEWNEIGH with invalid vlan id %d\n",
 				vid);
 			return -EINVAL;
@@ -794,7 +794,7 @@ int br_fdb_delete(struct ndmsg *ndm, struct nlattr *tb[],
 
 		vid = nla_get_u16(tb[NDA_VLAN]);
 
-		if (vid >= VLAN_N_VID) {
+		if (!vid || vid >= VLAN_VID_MASK) {
 			pr_info("bridge: RTM_NEWNEIGH with invalid vlan id %d\n",
 				vid);
 			return -EINVAL;
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index e74ddc1..f75d92e 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -243,7 +243,7 @@ static int br_afspec(struct net_bridge *br,
 
 		vinfo = nla_data(tb[IFLA_BRIDGE_VLAN_INFO]);
 
-		if (vinfo->vid >= VLAN_N_VID)
+		if (!vinfo->vid || vinfo->vid >= VLAN_VID_MASK)
 			return -EINVAL;
 
 		switch (cmd) {
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 9a9ffe7..21b6d21 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -45,37 +45,34 @@ static int __vlan_add(struct net_port_vlans *v, u16 vid, u16 flags)
 		return 0;
 	}
 
-	if (vid) {
-		if (v->port_idx) {
-			p = v->parent.port;
-			br = p->br;
-			dev = p->dev;
-		} else {
-			br = v->parent.br;
-			dev = br->dev;
-		}
-		ops = dev->netdev_ops;
-
-		if (p && (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) {
-			/* Add VLAN to the device filter if it is supported.
-			 * Stricly speaking, this is not necessary now, since
-			 * devices are made promiscuous by the bridge, but if
-			 * that ever changes this code will allow tagged
-			 * traffic to enter the bridge.
-			 */
-			err = ops->ndo_vlan_rx_add_vid(dev, htons(ETH_P_8021Q),
-						       vid);
-			if (err)
-				return err;
-		}
-
-		err = br_fdb_insert(br, p, dev->dev_addr, vid);
-		if (err) {
-			br_err(br, "failed insert local address into bridge "
-			       "forwarding table\n");
-			goto out_filt;
-		}
+	if (v->port_idx) {
+		p = v->parent.port;
+		br = p->br;
+		dev = p->dev;
+	} else {
+		br = v->parent.br;
+		dev = br->dev;
+	}
+	ops = dev->netdev_ops;
+
+	if (p && (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) {
+		/* Add VLAN to the device filter if it is supported.
+		 * Stricly speaking, this is not necessary now, since
+		 * devices are made promiscuous by the bridge, but if
+		 * that ever changes this code will allow tagged
+		 * traffic to enter the bridge.
+		 */
+		err = ops->ndo_vlan_rx_add_vid(dev, htons(ETH_P_8021Q),
+					       vid);
+		if (err)
+			return err;
+	}
 
+	err = br_fdb_insert(br, p, dev->dev_addr, vid);
+	if (err) {
+		br_err(br, "failed insert local address into bridge "
+		       "forwarding table\n");
+		goto out_filt;
 	}
 
 	set_bit(vid, v->vlan_bitmap);
@@ -98,7 +95,7 @@ static int __vlan_del(struct net_port_vlans *v, u16 vid)
 	__vlan_delete_pvid(v, vid);
 	clear_bit(vid, v->untagged_bitmap);
 
-	if (v->port_idx && vid) {
+	if (v->port_idx) {
 		struct net_device *dev = v->parent.port->dev;
 		const struct net_device_ops *ops = dev->netdev_ops;
 
@@ -248,7 +245,9 @@ bool br_allowed_egress(struct net_bridge *br,
 	return false;
 }
 
-/* Must be protected by RTNL */
+/* Must be protected by RTNL.
+ * Must be called with vid in range from 1 to 4094 inclusive.
+ */
 int br_vlan_add(struct net_bridge *br, u16 vid, u16 flags)
 {
 	struct net_port_vlans *pv = NULL;
@@ -278,7 +277,9 @@ out:
 	return err;
 }
 
-/* Must be protected by RTNL */
+/* Must be protected by RTNL.
+ * Must be called with vid in range from 1 to 4094 inclusive.
+ */
 int br_vlan_delete(struct net_bridge *br, u16 vid)
 {
 	struct net_port_vlans *pv;
@@ -289,14 +290,9 @@ int br_vlan_delete(struct net_bridge *br, u16 vid)
 	if (!pv)
 		return -EINVAL;
 
-	if (vid) {
-		/* If the VID !=0 remove fdb for this vid. VID 0 is special
-		 * in that it's the default and is always there in the fdb.
-		 */
-		spin_lock_bh(&br->hash_lock);
-		fdb_delete_by_addr(br, br->dev->dev_addr, vid);
-		spin_unlock_bh(&br->hash_lock);
-	}
+	spin_lock_bh(&br->hash_lock);
+	fdb_delete_by_addr(br, br->dev->dev_addr, vid);
+	spin_unlock_bh(&br->hash_lock);
 
 	__vlan_del(pv, vid);
 	return 0;
@@ -329,7 +325,9 @@ unlock:
 	return 0;
 }
 
-/* Must be protected by RTNL */
+/* Must be protected by RTNL.
+ * Must be called with vid in range from 1 to 4094 inclusive.
+ */
 int nbp_vlan_add(struct net_bridge_port *port, u16 vid, u16 flags)
 {
 	struct net_port_vlans *pv = NULL;
@@ -363,7 +361,9 @@ clean_up:
 	return err;
 }
 
-/* Must be protected by RTNL */
+/* Must be protected by RTNL.
+ * Must be called with vid in range from 1 to 4094 inclusive.
+ */
 int nbp_vlan_delete(struct net_bridge_port *port, u16 vid)
 {
 	struct net_port_vlans *pv;
@@ -374,14 +374,9 @@ int nbp_vlan_delete(struct net_bridge_port *port, u16 vid)
 	if (!pv)
 		return -EINVAL;
 
-	if (vid) {
-		/* If the VID !=0 remove fdb for this vid. VID 0 is special
-		 * in that it's the default and is always there in the fdb.
-		 */
-		spin_lock_bh(&port->br->hash_lock);
-		fdb_delete_by_addr(port->br, port->dev->dev_addr, vid);
-		spin_unlock_bh(&port->br->hash_lock);
-	}
+	spin_lock_bh(&port->br->hash_lock);
+	fdb_delete_by_addr(port->br, port->dev->dev_addr, vid);
+	spin_unlock_bh(&port->br->hash_lock);
 
 	return __vlan_del(pv, vid);
 }
-- 
1.8.1.2

^ permalink raw reply related

* [PATCH v2 net 2/4] bridge: Apply the PVID to priority-tagged frames
From: Toshiaki Makita @ 2013-10-16  8:07 UTC (permalink / raw)
  To: David S . Miller, Vlad Yasevich, netdev; +Cc: Toshiaki Makita, Toshiaki Makita
In-Reply-To: <1381910836-718-1-git-send-email-makita.toshiaki@lab.ntt.co.jp>

IEEE 802.1Q says that when we receive priority-tagged (VID 0) frames
use the PVID for the port as its VID.
(See IEEE 802.1Q-2011 6.9.1 and Table 9-2)

Apply the PVID to not only untagged frames but also priority-tagged frames.

Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
---
 net/bridge/br_vlan.c | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 21b6d21..5a9c44a 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -189,6 +189,8 @@ out:
 bool br_allowed_ingress(struct net_bridge *br, struct net_port_vlans *v,
 			struct sk_buff *skb, u16 *vid)
 {
+	int err;
+
 	/* If VLAN filtering is disabled on the bridge, all packets are
 	 * permitted.
 	 */
@@ -201,20 +203,31 @@ bool br_allowed_ingress(struct net_bridge *br, struct net_port_vlans *v,
 	if (!v)
 		return false;
 
-	if (br_vlan_get_tag(skb, vid)) {
+	err = br_vlan_get_tag(skb, vid);
+	if (!*vid) {
 		u16 pvid = br_get_pvid(v);
 
-		/* Frame did not have a tag.  See if pvid is set
-		 * on this port.  That tells us which vlan untagged
-		 * traffic belongs to.
+		/* Frame had a tag with VID 0 or did not have a tag.
+		 * See if pvid is set on this port.  That tells us which
+		 * vlan untagged or priority-tagged traffic belongs to.
 		 */
 		if (pvid == VLAN_N_VID)
 			return false;
 
-		/* PVID is set on this port.  Any untagged ingress
-		 * frame is considered to belong to this vlan.
+		/* PVID is set on this port.  Any untagged or priority-tagged
+		 * ingress frame is considered to belong to this vlan.
 		 */
-		__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), pvid);
+		if (likely(err))
+			/* Untagged Frame. */
+			__vlan_hwaccel_put_tag(skb, htons(ETH_P_8021Q), pvid);
+		else
+			/* Priority-tagged Frame.
+			 * At this point, We know that skb->vlan_tci had
+			 * VLAN_TAG_PRESENT bit and its VID field was 0x000.
+			 * We update only VID field and preserve PCP field.
+			 */
+			skb->vlan_tci |= pvid;
+
 		return true;
 	}
 
-- 
1.8.1.2

^ permalink raw reply related

* Re: Funds Investment and Management Placement
From: Freeman Appiah @ 2013-10-16  8:03 UTC (permalink / raw)

In-Reply-To: <818990015.924461.1381910585526.JavaMail.root@YTP-PMX2>



Attn I am Freeman Appiah,I serve as Manager of a Bank Branch in Ghana. In 2009, A customer made a fixed number deposit 24,600,000.00.This investor died four years ago leaving no WILL for Re-transfer to his next of kin. can you be of good assistance so that we can move the funds?

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox