Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] tcp: Socket option to set congestion window
From: David Miller @ 2010-05-27  7:28 UTC (permalink / raw)
  To: hagen; +Cc: andi, therbert, shemminger, netdev, ycheng
In-Reply-To: <20100527070827.GB2728@nuttenaction>

From: Hagen Paul Pfeifer <hagen@jauu.net>
Date: Thu, 27 May 2010 09:08:27 +0200

> And by the way, the IETF (and our) paradigm is still to shift functionality to
> end hosts - not into network core. "The Rise of the stupid network" [1] is
> still a paradigm that is superior to the alternative where vendors put their
> proprietary algorithms into the network and change the behavior in a
> uncontrollable fashion.

Superior or not, it's simply never going to happen.  We are far beyond
being able to get to where we were before NAT'ing and shaping devices
started to get inserted everywhere on the network.

And I also don't see any of this stuff as fundamentally proprietary.

People want deep packet inspection, people want to control their user's
traffic.  And people, most importantly, are willing to pay for this.

Therefore, these elements will always be in the network.

Better to co-exist with them and use them to our advantage instead of
fantasizing about a utopia where they don't exist.

^ permalink raw reply

* Re: [PATCH] tcp: Socket option to set congestion window
From: Hagen Paul Pfeifer @ 2010-05-27  7:08 UTC (permalink / raw)
  To: David Miller; +Cc: andi, therbert, shemminger, netdev, ycheng
In-Reply-To: <20100526.200443.232751390.davem@davemloft.net>

* David Miller | 2010-05-26 20:04:43 [-0700]:

>You're asking about a network level issue in terms of what can be done
>on a local end-node.

No, I *write* about network level issues, this is the important item in my
mind.  It is about network stability and network fairness. The lion share of
TCP algorithm are drafted to guarantee _network fairness and network stability_.

And by the way, the IETF (and our) paradigm is still to shift functionality to
end hosts - not into network core. "The Rise of the stupid network" [1] is
still a paradigm that is superior to the alternative where vendors put their
proprietary algorithms into the network and change the behavior in a
uncontrollable fashion.

>All an end-node can do is abide by congestion control rules and respond
>to packet drops, as has been going on for decades.

Right, and this will be reality for the next decades (at least for TCP;
maybe backed by ECN).

>People have basically (especially in Europe) given up on crazy crap
>like RSVP and other forms of bandwidth limiting and reservation.  They
>just oversubscribe their links, and increase their capacity as traffic
>increases dictate.  It just isn't all that manageable to put people's
>traffic into classes and control what they do on a large scale.
>
>I'm also skeptical about those who say the fight belongs squarely at
>the end nodes.  If you want to control the network traffic of the
>meeting point of your dumbbell, you'll need a machine there doing RED
>or traffic limiting.  End-host schemes simply aren't going to work
>because I can just add more end-hosts to reintroduce the problem.

I am not happy with this statement. This differs from the previous paragraph
where you complain about intelligent network components. Davem until these
days the routers do exactly this, they do RED/WRED whatever and signal to the
producer to reduce their bandwidth.

And this is the most important aspect in this email: core network components
rely on end hosts to behave in a fair manner. Disable Slow Start/Congestion
Avoidance and the network will instantly collapse (mmh, net-next? ;-)

The mechanism as proposed in the patch is not fair. There are a lot of
publications available that analyse the impact CWND in great detail as well as
several RFC that talk about the CWND.

>The dumbbell situation is independant of the end-node issues, that's
>all I'm really saying.

Davem, I know that you are a good guy and worries about fairness aspects
really well. I wrote this email to popularize fairness and network stability
aspects to the broad audience.

Hagen

[1] http://isen.com/stupid.html

>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Die Zensur ist das lebendige Gestaendnis der Grossen, dass sie 
nur verdummte Sklaven treten, aber keine freien Voelker regieren koennen.
- Johann Nepomuk Nestroy

^ permalink raw reply

* Re: [RFC] IFLA_PORT_* iproute2 cmd line
From: Arnd Bergmann @ 2010-05-27  7:00 UTC (permalink / raw)
  To: Scott Feldman
  Cc: netdev, Chris Wright, Stephen Hemminger, Stefan Berger,
	Dirk Herrendoerfer, Vivek Kashyap
In-Reply-To: <C8228316.35508%scofeldm@cisco.com>

On Wednesday 26 May 2010, Scott Feldman wrote:
> On 5/26/10 5:38 AM, "Arnd Bergmann" <arnd@arndb.de> wrote:
> 
> How does this strike you?
> 
>   Usage: ip link add link DEV [ name ] NAME
>                      [ txqueuelen PACKETS ]
>                      [ address LLADDR ]
>                      [ broadcast LLADDR ]
>                      [ mtu MTU ]
>                      type TYPE [ ARGS ]
>          ip link delete DEV type TYPE [ ARGS ]
> 
>          ip link set DEVICE [ { up | down } ]
>                             [ arp { on | off } ]
>                             [ dynamic { on | off } ]
>                             [ multicast { on | off } ]
>                             [ allmulticast { on | off } ]
>                             [ promisc { on | off } ]
>                             [ trailers { on | off } ]
>                             [ txqueuelen PACKETS ]
>                             [ name NEWNAME ]
>                             [ address LLADDR ]
>                             [ broadcast LLADDR ]
>                             [ mtu MTU ]
>                             [ netns PID ]
>                             [ alias NAME ]
> +                           [ virtualport MODE {PROFILE|VSI} ]
>                             [ vf NUM [ mac LLADDR ]
>                                      [ vlan VLANID [ qos VLAN-QOS ] ]
> -                                    [ rate TXRATE ] ]
> +                                    [ rate TXRATE ]
> +                                    [ virtualport MODE {PROFILE|VSI} ] ]
>          ip link show [ DEVICE ]
> 
>   TYPE := { vlan | veth | vcan | dummy | ifb | macvlan | can }
> +
> + MODE := { associate | preassociate | preassociaterr | disassociate }
> +
> + PROFILE := port-profile PORT-PROFILE
> +            [ instance-uuid INSTANCE-UUID ]
> +            [ host-uuid HOST-UUID ]
> + 
> + VSI := vsi managerid MGR typeid VTID typeidversion VER
> +        [ instance-uuid INSTANCE-UUID ]

Right, exactly what I was thinking of. I would probably use
some shorter keywords (virtual-port -> port, port-profile -> profile,
instance-uuid -> instance, host-uuid -> host), but I really
don't have a strong opionion on that, so I'f fine with
whatever you and others come up with in that regard.

> > The more interesting question is how to do this when we
> > talk to lldpad. One idea was to use the same protocol
> > but to direct the message to a specific pid (that of lldpad).
> > That would require adding an option like '-p PID' to ip
> > that lets us change who we talk to.
> 
> Let me get the user-to-kernel part working to establish the cmd line and you
> can follow up with alternative addressing schemes.

ok.

	Arnd

^ permalink raw reply

* Re: [patch 2/2] be2net: remove superfluous externs
From: Sarveshwar Bandi @ 2010-05-27  6:32 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Sathya Perla, Subbu Seetharaman, Ajit Khaparde, David S. Miller,
	netdev, kernel-janitors
In-Reply-To: <20100526144739.GM22515@bicker>

Thanks.

Acked-by: Sarveshwar Bandi <sarveshwarb@serverengines.com>
On 26/05/10 16:47 +0200, Dan Carpenter wrote:
> This fixes some sparse warnings:
> drivers/net/benet/be_cmds.c:1503:12: warning: function
> 	'be_cmd_enable_magic_wol' with external linkage has definition
> drivers/net/benet/be_cmds.c:1668:12: warning: function
> 	'be_cmd_get_seeprom_data' with external linkage has definition
> 
> Signed-off-by: Dan Carpenter <error27@gmail.com>
> 
> diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
> index c911bfb..f9c0d4d 100644
> --- a/drivers/net/benet/be_cmds.c
> +++ b/drivers/net/benet/be_cmds.c
> @@ -1497,7 +1500,7 @@ err:
>  	return status;
>  }
>  
> -extern int be_cmd_enable_magic_wol(struct be_adapter *adapter, u8 *mac,
> +int be_cmd_enable_magic_wol(struct be_adapter *adapter, u8 *mac,
>  				struct be_dma_mem *nonemb_cmd)
>  {
>  	struct be_mcc_wrb *wrb;
> @@ -1662,7 +1665,7 @@ err:
>  	return status;
>  }
>  
> -extern int be_cmd_get_seeprom_data(struct be_adapter *adapter,
> +int be_cmd_get_seeprom_data(struct be_adapter *adapter,
>  				struct be_dma_mem *nonemb_cmd)
>  {
>  	struct be_mcc_wrb *wrb;
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [patch 1/2] be2net: add unlock on error path
From: Sarveshwar Bandi @ 2010-05-27  6:31 UTC (permalink / raw)
  To: Dan Carpenter
  Cc: Sathya Perla, Subbu Seetharaman, Ajit Khaparde, David S. Miller,
	netdev, kernel-janitors
In-Reply-To: <20100526144634.GL22515@bicker>

Thanks

Acked-by: Sarveshwar Bandi <sarveshwarb@serverengines.com> 
On 26/05/10 16:46 +0200, Dan Carpenter wrote:
> The unlock accidentally got removed from the error path in dd131e76e5:
> "be2net: Bug fix to avoid disabling bottom half during firmware upgrade."
>
> Signed-off-by: Dan Carpenter <error27@gmail.com>
> 
> diff --git a/drivers/net/benet/be_cmds.c b/drivers/net/benet/be_cmds.c
> index c911bfb..18d5789 100644
> --- a/drivers/net/benet/be_cmds.c
> +++ b/drivers/net/benet/be_cmds.c
> @@ -1429,7 +1429,7 @@ int be_cmd_write_flashrom(struct be_adapter *adapter, struct be_dma_mem *cmd,
>  	wrb = wrb_from_mccq(adapter);
>  	if (!wrb) {
>  		status = -EBUSY;
> -		goto err;
> +		goto err_unlock;
>  	}
>  	req = cmd->va;
>  	sge = nonembedded_sgl(wrb);
> @@ -1457,7 +1457,10 @@ int be_cmd_write_flashrom(struct be_adapter *adapter, struct be_dma_mem *cmd,
>  	else
>  		status = adapter->flash_status;
>  
> -err:
> +	return status;
> +
> +err_unlock:
> +	spin_unlock_bh(&adapter->mcc_lock);
>  	return status;
>  }
>  
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] net: fix lock_sock_bh/unlock_sock_bh
From: Anton Blanchard @ 2010-05-27  6:09 UTC (permalink / raw)
  To: David Miller; +Cc: eric.dumazet, netdev
In-Reply-To: <20100526.222332.233688655.davem@davemloft.net>


> > [PATCH v2] net: fix lock_sock_bh/unlock_sock_bh
> > 
> > This new sock lock primitive was introduced to speedup some user context
> > socket manipulation. But it is unsafe to protect two threads, one using
> > regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh
> > 
> > This patch changes lock_sock_bh to be careful against 'owned' state.
> > If owned is found to be set, we must take the slow path.
> > lock_sock_bh() now returns a boolean to say if the slow path was taken,
> > and this boolean is used at unlock_sock_bh time to call the appropriate
> > unlock function.
> > 
> > After this change, BH are either disabled or enabled during the
> > lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
> > so we rename these functions to lock_sock_fast()/unlock_sock_fast().
> > 
> > Reported-by: Anton Blanchard <anton@samba.org>
> > Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> 
> Looks good, I'll wait for positive testing from Anton before applying
> this.

Thanks guys, this fixed it.

Tested-by: Anton Blanchard <anton@samba.org>

Anton

^ permalink raw reply

* Re: [PATCH 3/3] cxgb4i_v3: iscsi and libcxgbi library for handling common part
From: Rakesh Ranjan @ 2010-05-27  6:05 UTC (permalink / raw)
  To: Mike Christie
  Cc: open-iscsi-/JYPxA39Uh5TLH3MbocFFw, Rakesh Ranjan, NETDEVML,
	SCSIDEVML, LKML, Karen Xie, David Miller, James Bottomley,
	Anish Bhatt
In-Reply-To: <4BFE0A3E.9020804-hcNo3dDEHLuVc3sceRu5cw@public.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512

On 05/27/2010 11:29 AM, Mike Christie wrote:
> On 05/15/2010 12:24 PM, Rakesh Ranjan wrote:
>> From: Rakesh Ranjan<rranjan-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
>>
>>
>> Signed-off-by: Rakesh Ranjan<rakesh-ut6Up61K2wZBDgjK7y7TUQ@public.gmane.org>
>> ---
>>   drivers/scsi/cxgb4i/cxgb4i_iscsi.c |  617
>> ++++++++++++++++++++++++++++++++++++
>>   drivers/scsi/cxgb4i/libcxgbi.c     |  589
>> ++++++++++++++++++++++++++++++++++
>>   drivers/scsi/cxgb4i/libcxgbi.h     |  430 +++++++++++++++++++++++++
> 
> I think the patch had some whitespace/newline issues. When I did git-am
> I got:
> 
> warning: squelched 1 whitespace error
> warning: 6 lines add whitespace errors.
> 
> I think James can just fix up when he merges with git, but in the future
> you might want to try a git-am on your patch before you send (git-am
> --whitespace=fix will fix it up for you).
> 
> 
> 
>> +
>> +int cxgbi_pdu_init(struct cxgbi_device *cdev)
>> +{
>> +    if (cdev->skb_tx_headroom>  (512 * MAX_SKB_FRAGS))
>> +        cdev->skb_extra_headroom = cdev->skb_tx_headroom;
>> +    cdev->pad_page = alloc_page(GFP_KERNEL);
>> +    if (cdev->pad_page)
>> +        return -ENOMEM;
>> +    memset(page_address(cdev->pad_page), 0, PAGE_SIZE);
>> +    return 0;
>> +
>> +    /*
>> +    if (SKB_TX_HEADROOM>  (512 * MAX_SKB_FRAGS))
>> +        skb_extra_headroom = SKB_TX_HEADROOM;
>> +    pad_page = alloc_page(GFP_KERNEL);
>> +    if (!pad_page)
>> +        return -ENOMEM;
>> +    memset(page_address(pad_page), 0, PAGE_SIZE);
>> +    return 0;*/
> 
> Clean this up.
> 
>> +
>> +void cxgbi_release_itt(struct iscsi_task *task, itt_t hdr_itt)
>> +{
>> +    struct scsi_cmnd *sc = task->sc;
>> +    struct iscsi_tcp_conn *tcp_conn = task->conn->dd_data;
>> +    struct cxgbi_conn *cconn = tcp_conn->dd_data;
>> +    struct cxgbi_device *cdev = cconn->chba->cdev;
>> +    struct cxgbi_tag_format *tformat =&cdev->tag_format;
>> +    u32 tag = ntohl((__force u32)hdr_itt);
>> +
>> +    cxgbi_tag_debug("release tag 0x%x.\n", tag);
>> +
>> +    if (sc&&
>> +        (scsi_bidi_cmnd(sc) ||
>> +         sc->sc_data_direction == DMA_FROM_DEVICE)&&
>> +            cxgbi_is_ddp_tag(tformat, tag))
> 
> The formatting is a little weird. I think you want each line to start in
> the same column instead of each getting tabbed over.
> 
> 
>> +
>> +
>> +int cxgbi_reserve_itt(struct iscsi_task *task, itt_t *hdr_itt)
>> +{
>> +    struct scsi_cmnd *sc = task->sc;
>> +    struct iscsi_conn *conn = task->conn;
>> +    struct iscsi_session *sess = conn->session;
>> +    struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
>> +    struct cxgbi_conn *cconn = tcp_conn->dd_data;
>> +    struct cxgbi_device *cdev = cconn->chba->cdev;
>> +    struct cxgbi_tag_format *tformat =&cdev->tag_format;
>> +    u32 sw_tag = (sess->age<<  cconn->task_idx_bits) | task->itt;
>> +    u32 tag;
>> +    int err = -EINVAL;
>> +
>> +    if (sc&&
>> +        (scsi_bidi_cmnd(sc) ||
>> +         sc->sc_data_direction == DMA_FROM_DEVICE)&&
>> +            cxgbi_sw_tag_usable(tformat, sw_tag)) {
> 
> 
> Same tabbing.
> 
> 
>> +    volatile unsigned int state;
> 
> I did not get why this needed to be volatile (I see it is like this in
> cxgb3i and I did not see why in there too).
> 
> 
> 
>> +
>> +static inline void cxgbi_sock_hold(struct cxgbi_sock *csk)
>> +{
>> +    atomic_inc(&csk->refcnt);
> 
> 
> We want people to use krefs instead of their own refcounting now.
> 
>> +
>> +static inline void *cplhdr(struct sk_buff *skb)
>> +{
>> +    return skb->data;
>> +}
> 
> 
> Seems kinda useless.

Hi Mike,

Thanks for the review, will send you the updated patch.

Regards
Rakesh Ranjan

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iQIcBAEBCgAGBQJL/guKAAoJEBqoHbxtDU4JFvQP/2uIWFmZo4nqcJ7xmEiBjnaq
BeUfjbRGaiAImEM/6OuMIB/Fm9p9NGtW0GsZdqKYSdZBmgO1kWNT1onCZgVF4eW1
Muikjo+I2wXGsWIt4ZqbZs722COC1oQHXpyulfh/6ONzeWJ3R8vos/TnCw256Wxj
HL42jIO882JCImzmP/wmdPmDlTUI/Z0UEYbu0djy20yzESo2NDHq1PYYopyXIboH
joJ/sZvK0jRYrRsDdq84XJ8CUv1yDE8m3NiMV8cV9JN1EwP/gzBrq0DXfFgpdEDU
+4/pfzzxJ5y/ekdZaSZdx9ke/n/vmamJCaQONaL2WfQaznogxqZypaoM/NRzfMlH
40KY5lqajGRnm0aEBffn8uqBzEbopYpb1m+3mzt/vDtvdBRuSJqmcPbUandTNHLe
cZQnn689GtWqJfoyVF/dfyubtAZFLu8VVZkFxx1e3UDEqVDUuwVEHXoXSt6K/SkB
UnZI7hMUYSof+WI+UEV8DR4KMsRbE1cnvXGnsXTZOMhMsU3KoMielaGTgepAbxT7
bLfaARiMwvU+vle5546uK8IuxjGH81nbUEy6R86OeS6Osv6PZFiGYP9yxjzkp4K7
NWOTnVV9qTaPPTtN9SORINPXUXuI90JgIDbh9qnfDDKl2oj5HmwBpa7S3KmMxxlI
eoAqa/9LLjrMVi5Wicv3
=TOBq
-----END PGP SIGNATURE-----

-- 
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To post to this group, send email to open-iscsi-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
To unsubscribe from this group, send email to open-iscsi+unsubscribe-/JYPxA39Uh5TLH3MbocFF+G/Ez6ZCGd0@public.gmane.org
For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.

^ permalink raw reply

* Re: [PATCH 3/3] cxgb4i_v3: iscsi and libcxgbi library for handling common part
From: Mike Christie @ 2010-05-27  5:59 UTC (permalink / raw)
  To: open-iscsi
  Cc: Rakesh Ranjan, NETDEVML, SCSIDEVML, LKML, Karen Xie, David Miller,
	James Bottomley, Anish Bhatt, Rakesh Ranjan
In-Reply-To: <1273944249-311-4-git-send-email-rakesh@chelsio.com>

On 05/15/2010 12:24 PM, Rakesh Ranjan wrote:
> From: Rakesh Ranjan<rranjan@chelsio.com>
>
>
> Signed-off-by: Rakesh Ranjan<rakesh@chelsio.com>
> ---
>   drivers/scsi/cxgb4i/cxgb4i_iscsi.c |  617 ++++++++++++++++++++++++++++++++++++
>   drivers/scsi/cxgb4i/libcxgbi.c     |  589 ++++++++++++++++++++++++++++++++++
>   drivers/scsi/cxgb4i/libcxgbi.h     |  430 +++++++++++++++++++++++++

I think the patch had some whitespace/newline issues. When I did git-am 
I got:

warning: squelched 1 whitespace error
warning: 6 lines add whitespace errors.

I think James can just fix up when he merges with git, but in the future 
you might want to try a git-am on your patch before you send (git-am 
--whitespace=fix will fix it up for you).



> +
> +int cxgbi_pdu_init(struct cxgbi_device *cdev)
> +{
> +	if (cdev->skb_tx_headroom>  (512 * MAX_SKB_FRAGS))
> +		cdev->skb_extra_headroom = cdev->skb_tx_headroom;
> +	cdev->pad_page = alloc_page(GFP_KERNEL);
> +	if (cdev->pad_page)
> +		return -ENOMEM;
> +	memset(page_address(cdev->pad_page), 0, PAGE_SIZE);
> +	return 0;
> +
> +	/*
> +	if (SKB_TX_HEADROOM>  (512 * MAX_SKB_FRAGS))
> +		skb_extra_headroom = SKB_TX_HEADROOM;
> +	pad_page = alloc_page(GFP_KERNEL);
> +	if (!pad_page)
> +		return -ENOMEM;
> +	memset(page_address(pad_page), 0, PAGE_SIZE);
> +	return 0;*/

Clean this up.

> +
> +void cxgbi_release_itt(struct iscsi_task *task, itt_t hdr_itt)
> +{
> +	struct scsi_cmnd *sc = task->sc;
> +	struct iscsi_tcp_conn *tcp_conn = task->conn->dd_data;
> +	struct cxgbi_conn *cconn = tcp_conn->dd_data;
> +	struct cxgbi_device *cdev = cconn->chba->cdev;
> +	struct cxgbi_tag_format *tformat =&cdev->tag_format;
> +	u32 tag = ntohl((__force u32)hdr_itt);
> +
> +	cxgbi_tag_debug("release tag 0x%x.\n", tag);
> +
> +	if (sc&&
> +		(scsi_bidi_cmnd(sc) ||
> +		 sc->sc_data_direction == DMA_FROM_DEVICE)&&
> +			cxgbi_is_ddp_tag(tformat, tag))

The formatting is a little weird. I think you want each line to start in 
the same column instead of each getting tabbed over.


> +
> +
> +int cxgbi_reserve_itt(struct iscsi_task *task, itt_t *hdr_itt)
> +{
> +	struct scsi_cmnd *sc = task->sc;
> +	struct iscsi_conn *conn = task->conn;
> +	struct iscsi_session *sess = conn->session;
> +	struct iscsi_tcp_conn *tcp_conn = conn->dd_data;
> +	struct cxgbi_conn *cconn = tcp_conn->dd_data;
> +	struct cxgbi_device *cdev = cconn->chba->cdev;
> +	struct cxgbi_tag_format *tformat =&cdev->tag_format;
> +	u32 sw_tag = (sess->age<<  cconn->task_idx_bits) | task->itt;
> +	u32 tag;
> +	int err = -EINVAL;
> +
> +	if (sc&&
> +		(scsi_bidi_cmnd(sc) ||
> +		 sc->sc_data_direction == DMA_FROM_DEVICE)&&
> +			cxgbi_sw_tag_usable(tformat, sw_tag)) {


Same tabbing.


> +	volatile unsigned int state;

I did not get why this needed to be volatile (I see it is like this in 
cxgb3i and I did not see why in there too).



> +
> +static inline void cxgbi_sock_hold(struct cxgbi_sock *csk)
> +{
> +	atomic_inc(&csk->refcnt);


We want people to use krefs instead of their own refcounting now.

> +
> +static inline void *cplhdr(struct sk_buff *skb)
> +{
> +	return skb->data;
> +}


Seems kinda useless.

^ permalink raw reply

* [PATCH] mlx4_en: use net_device dev_id to indicate port number
From: Eli Cohen @ 2010-05-27  5:56 UTC (permalink / raw)
  To: davem-fT/PcQaiUtIeIZ0/mPfg9Q
  Cc: netdev-u79uwXL29TY76Z2rM5mHXA, linux-rdma-u79uwXL29TY76Z2rM5mHXA,
	rdreier-FYB4Gu1CFyUAvxtiuMwx3w, yevgenyp-VPRAkNaXOzVS1MOuV/RT9w,
	shemminger-ZtmgI6mnKB3QT0dZR+AlfA

Today, there are no means to know which port of a hardware device a netdev
interface uses. struct net_device conatins a field, dev_id, that can be used
for that. Use this field to save the port number in ConnectX that is being used
by the net device; port numbers are zero based.

Signed-off-by: Eli Cohen <eli-VPRAkNaXOzVS1MOuV/RT9w@public.gmane.org>
---
 drivers/net/mlx4/en_netdev.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/net/mlx4/en_netdev.c b/drivers/net/mlx4/en_netdev.c
index 6c2b15b..ab1d480 100644
--- a/drivers/net/mlx4/en_netdev.c
+++ b/drivers/net/mlx4/en_netdev.c
@@ -978,6 +978,7 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
 	}
 
 	SET_NETDEV_DEV(dev, &mdev->dev->pdev->dev);
+	dev->dev_id =  port - 1;
 
 	/*
 	 * Initialize driver private data
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply related

* [PATCH] net: add rtnl assertion to linkwatch_run_queue
From: Denis Kirjanov @ 2010-05-27  5:51 UTC (permalink / raw)
  To: davem; +Cc: eric.dumazet, shemminger, herbert, kaber, netdev

Add RTNL assertion to linkwatch_run_queue since function 
must be called with semaphore held

Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>
---
net/core/link_watch.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/net/core/link_watch.c b/net/core/link_watch.c
index bdbce2f..e6d06c0 100644
--- a/net/core/link_watch.c
+++ b/net/core/link_watch.c
@@ -220,6 +220,7 @@ void linkwatch_forget_dev(struct net_device *dev)
 /* Must be called with the rtnl semaphore held */
 void linkwatch_run_queue(void)
 {
+	ASSERT_RTNL();
 	__linkwatch_run_queue(0);
 }
 

^ permalink raw reply related

* Re: [PATCH] net: fix lock_sock_bh/unlock_sock_bh
From: David Miller @ 2010-05-27  5:23 UTC (permalink / raw)
  To: eric.dumazet; +Cc: anton, netdev
In-Reply-To: <1274937618.2542.46.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 27 May 2010 07:20:18 +0200

> [PATCH v2] net: fix lock_sock_bh/unlock_sock_bh
> 
> This new sock lock primitive was introduced to speedup some user context
> socket manipulation. But it is unsafe to protect two threads, one using
> regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh
> 
> This patch changes lock_sock_bh to be careful against 'owned' state.
> If owned is found to be set, we must take the slow path.
> lock_sock_bh() now returns a boolean to say if the slow path was taken,
> and this boolean is used at unlock_sock_bh time to call the appropriate
> unlock function.
> 
> After this change, BH are either disabled or enabled during the
> lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
> so we rename these functions to lock_sock_fast()/unlock_sock_fast().
> 
> Reported-by: Anton Blanchard <anton@samba.org>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>

Looks good, I'll wait for positive testing from Anton before applying
this.

^ permalink raw reply

* Re: [PATCH] net: fix lock_sock_bh/unlock_sock_bh
From: Eric Dumazet @ 2010-05-27  5:20 UTC (permalink / raw)
  To: David Miller; +Cc: anton, netdev
In-Reply-To: <1274936763.2542.42.camel@edumazet-laptop>

Le jeudi 27 mai 2010 à 07:06 +0200, Eric Dumazet a écrit :
> Le mercredi 26 mai 2010 à 21:21 -0700, David Miller a écrit :
> > From: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Thu, 27 May 2010 06:18:46 +0200
> > 
> > > Process context :
> > > 
> > > lock only (sk_lock.slock = locked, sk_lock.owned = ???)
> > > 
> > > So I should add a test on owned. If set (by another thread), we should take the slow path.
> > 
> > Indeed, what you're doing now is broken.
> > 
> > If owned is non-zero when you take the spinlock you have to queue your
> > work, just like how we toss packets into the socket backlog, which is
> > processed by release_sock(), when this happens.
> 
> Here is the patch I am going to test today.
> 
> Anton, could you please test it too, just for sure ?
> 
> Thanks !
> 
> 
> [PATCH] net: fix lock_sock_bh/unlock_sock_bh
> 
> This new sock lock primitive was introduced to speedup some user context
> socket manipulation. But it is unsafe to protect two threads, one using
> regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh
> 
> This patch changes lock_sock_bh to be careful against 'owned' state.
> If owned is found to be set, we must take the slow path.
> lock_sock_bh() now returns a boolean to say if the slow path was taken,
> and this boolean is used at unlock_sock_bh time to call the appropriate
> unlock function.
> 
> After this change, BH are either disabled or enabled during the
> lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
> so we rename these functions to lock_sock_fast()/unlock_sock_fast().
> 
> Reported-by: Anton Blanchard <anton@samba.org>
> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
> ---
>  include/net/sock.h  |   20 ++++++++++++++------
>  net/core/datagram.c |    6 ++++--
>  net/core/sock.c     |   32 ++++++++++++++++++++++++++++++++
>  net/ipv4/udp.c      |   14 ++++++++------
>  net/ipv6/udp.c      |    5 +++--
>  5 files changed, 61 insertions(+), 16 deletions(-)
> 
> diff --git a/include/net/sock.h b/include/net/sock.h
> index d2a71b0..ca241ea 100644
> --- a/include/net/sock.h
> +++ b/include/net/sock.h
> @@ -1026,15 +1026,23 @@ extern void release_sock(struct sock *sk);
>  				SINGLE_DEPTH_NESTING)
>  #define bh_unlock_sock(__sk)	spin_unlock(&((__sk)->sk_lock.slock))
>  
> -static inline void lock_sock_bh(struct sock *sk)
> +extern bool lock_sock_fast(struct sock *sk);
> +/**
> + * unlock_sock_fast - complement of lock_sock_fast
> + * @sk: socket
> + * @slow: slow mode
> + *
> + * fast unlock socket for user context.
> + * If slow mode is on, we call regular release_sock()
> + */
> +static inline void unlock_sock_fast(struct sock *sk, bool slow)
>  {
> -	spin_lock_bh(&sk->sk_lock.slock);
> +	if (slow)
> +		release_sock(sk);
> +	else
> +		spin_unlock_bh(&sk->sk_lock.slock);
>  }
>  
> -static inline void unlock_sock_bh(struct sock *sk)
> -{
> -	spin_unlock_bh(&sk->sk_lock.slock);
> -}
>  
>  extern struct sock		*sk_alloc(struct net *net, int family,
>  					  gfp_t priority,
> diff --git a/net/core/datagram.c b/net/core/datagram.c
> index e009753..f5b6f43 100644
> --- a/net/core/datagram.c
> +++ b/net/core/datagram.c
> @@ -229,15 +229,17 @@ EXPORT_SYMBOL(skb_free_datagram);
>  
>  void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb)
>  {
> +	bool slow;
> +
>  	if (likely(atomic_read(&skb->users) == 1))
>  		smp_rmb();
>  	else if (likely(!atomic_dec_and_test(&skb->users)))
>  		return;
>  
> -	lock_sock_bh(sk);
> +	slow = lock_sock_fast(sk);
>  	skb_orphan(skb);
>  	sk_mem_reclaim_partial(sk);
> -	unlock_sock_bh(sk);
> +	unlock_sock_fast(sk, slow);
>  
>  	/* skb is now orphaned, can be freed outside of locked section */
>  	__kfree_skb(skb);
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 37fe9b6..7ab6398 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2007,6 +2007,38 @@ void release_sock(struct sock *sk)
>  }
>  EXPORT_SYMBOL(release_sock);
>  
> +/**
> + * lock_sock_fast - fast version of lock_sock
> + * @sk: socket
> + *
> + * This version should be used for very small section, where process wont block
> + * return false if fast path is taken
> + *   sk_lock.slock locked, owned = 0, BH disabled
> + * return true if slow path is taken
> + *   sk_lock.slock unlocked, owned = 1, BH enabled
> + */
> +bool lock_sock_fast(struct sock *sk)
> +{
> +	might_sleep();
> +	spin_lock_bh(&sk->sk_lock.slock);
> +
> +	if (!sk->sk_lock.owned)
> +		/*
> +		 * Note : We must disable BH or risk a deadlock
> +		 */
> +		return false;
> +
> +	__lock_sock(sk);
> +	sk->sk_lock.owned = 1;
> +	spin_unlock(&sk->sk_lock.slock);
> +	/*
> +	 * The sk_lock has mutex_lock() semantics here:
> +	 */
> +	mutex_acquire(&sk->sk_lock.dep_map, 0, 0, _RET_IP_);

Oops, I missed a local_bh_enable(); here

> +	return true;
> +}
> +EXPORT_SYMBOL(lock_sock_fast);
> +
>  int sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp)
>  {
>  	struct timeval tv;

Here is v2 of patch with local_bh_enable(); added in lock_sock_fast()

[PATCH v2] net: fix lock_sock_bh/unlock_sock_bh

This new sock lock primitive was introduced to speedup some user context
socket manipulation. But it is unsafe to protect two threads, one using
regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh

This patch changes lock_sock_bh to be careful against 'owned' state.
If owned is found to be set, we must take the slow path.
lock_sock_bh() now returns a boolean to say if the slow path was taken,
and this boolean is used at unlock_sock_bh time to call the appropriate
unlock function.

After this change, BH are either disabled or enabled during the
lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
so we rename these functions to lock_sock_fast()/unlock_sock_fast().

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/sock.h  |   20 ++++++++++++++------
 net/core/datagram.c |    6 ++++--
 net/core/sock.c     |   33 +++++++++++++++++++++++++++++++++
 net/ipv4/udp.c      |   14 ++++++++------
 net/ipv6/udp.c      |    5 +++--
 5 files changed, 62 insertions(+), 16 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index d2a71b0..ca241ea 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1026,15 +1026,23 @@ extern void release_sock(struct sock *sk);
 				SINGLE_DEPTH_NESTING)
 #define bh_unlock_sock(__sk)	spin_unlock(&((__sk)->sk_lock.slock))
 
-static inline void lock_sock_bh(struct sock *sk)
+extern bool lock_sock_fast(struct sock *sk);
+/**
+ * unlock_sock_fast - complement of lock_sock_fast
+ * @sk: socket
+ * @slow: slow mode
+ *
+ * fast unlock socket for user context.
+ * If slow mode is on, we call regular release_sock()
+ */
+static inline void unlock_sock_fast(struct sock *sk, bool slow)
 {
-	spin_lock_bh(&sk->sk_lock.slock);
+	if (slow)
+		release_sock(sk);
+	else
+		spin_unlock_bh(&sk->sk_lock.slock);
 }
 
-static inline void unlock_sock_bh(struct sock *sk)
-{
-	spin_unlock_bh(&sk->sk_lock.slock);
-}
 
 extern struct sock		*sk_alloc(struct net *net, int family,
 					  gfp_t priority,
diff --git a/net/core/datagram.c b/net/core/datagram.c
index e009753..f5b6f43 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -229,15 +229,17 @@ EXPORT_SYMBOL(skb_free_datagram);
 
 void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb)
 {
+	bool slow;
+
 	if (likely(atomic_read(&skb->users) == 1))
 		smp_rmb();
 	else if (likely(!atomic_dec_and_test(&skb->users)))
 		return;
 
-	lock_sock_bh(sk);
+	slow = lock_sock_fast(sk);
 	skb_orphan(skb);
 	sk_mem_reclaim_partial(sk);
-	unlock_sock_bh(sk);
+	unlock_sock_fast(sk, slow);
 
 	/* skb is now orphaned, can be freed outside of locked section */
 	__kfree_skb(skb);
diff --git a/net/core/sock.c b/net/core/sock.c
index 37fe9b6..2cf7f9f 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2007,6 +2007,39 @@ void release_sock(struct sock *sk)
 }
 EXPORT_SYMBOL(release_sock);
 
+/**
+ * lock_sock_fast - fast version of lock_sock
+ * @sk: socket
+ *
+ * This version should be used for very small section, where process wont block
+ * return false if fast path is taken
+ *   sk_lock.slock locked, owned = 0, BH disabled
+ * return true if slow path is taken
+ *   sk_lock.slock unlocked, owned = 1, BH enabled
+ */
+bool lock_sock_fast(struct sock *sk)
+{
+	might_sleep();
+	spin_lock_bh(&sk->sk_lock.slock);
+
+	if (!sk->sk_lock.owned)
+		/*
+		 * Note : We must disable BH
+		 */
+		return false;
+
+	__lock_sock(sk);
+	sk->sk_lock.owned = 1;
+	spin_unlock(&sk->sk_lock.slock);
+	/*
+	 * The sk_lock has mutex_lock() semantics here:
+	 */
+	mutex_acquire(&sk->sk_lock.dep_map, 0, 0, _RET_IP_);
+	local_bh_enable();
+	return true;
+}
+EXPORT_SYMBOL(lock_sock_fast);
+
 int sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp)
 {
 	struct timeval tv;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9de6a69..b9d0d40 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1063,10 +1063,11 @@ static unsigned int first_packet_length(struct sock *sk)
 	spin_unlock_bh(&rcvq->lock);
 
 	if (!skb_queue_empty(&list_kill)) {
-		lock_sock_bh(sk);
+		bool slow = lock_sock_fast(sk);
+
 		__skb_queue_purge(&list_kill);
 		sk_mem_reclaim_partial(sk);
-		unlock_sock_bh(sk);
+		unlock_sock_fast(sk, slow);
 	}
 	return res;
 }
@@ -1123,6 +1124,7 @@ int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	int peeked;
 	int err;
 	int is_udplite = IS_UDPLITE(sk);
+	bool slow;
 
 	/*
 	 *	Check any passed addresses
@@ -1197,10 +1199,10 @@ out:
 	return err;
 
 csum_copy_err:
-	lock_sock_bh(sk);
+	slow = lock_sock_fast(sk);
 	if (!skb_kill_datagram(sk, skb, flags))
 		UDP_INC_STATS_USER(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
-	unlock_sock_bh(sk);
+	unlock_sock_fast(sk, slow);
 
 	if (noblock)
 		return -EAGAIN;
@@ -1625,9 +1627,9 @@ int udp_rcv(struct sk_buff *skb)
 
 void udp_destroy_sock(struct sock *sk)
 {
-	lock_sock_bh(sk);
+	bool slow = lock_sock_fast(sk);
 	udp_flush_pending_frames(sk);
-	unlock_sock_bh(sk);
+	unlock_sock_fast(sk, slow);
 }
 
 /*
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 3d7a2c0..87be586 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -328,6 +328,7 @@ int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 	int err;
 	int is_udplite = IS_UDPLITE(sk);
 	int is_udp4;
+	bool slow;
 
 	if (addr_len)
 		*addr_len=sizeof(struct sockaddr_in6);
@@ -424,7 +425,7 @@ out:
 	return err;
 
 csum_copy_err:
-	lock_sock_bh(sk);
+	slow = lock_sock_fast(sk);
 	if (!skb_kill_datagram(sk, skb, flags)) {
 		if (is_udp4)
 			UDP_INC_STATS_USER(sock_net(sk),
@@ -433,7 +434,7 @@ csum_copy_err:
 			UDP6_INC_STATS_USER(sock_net(sk),
 					UDP_MIB_INERRORS, is_udplite);
 	}
-	unlock_sock_bh(sk);
+	unlock_sock_fast(sk, slow);
 
 	if (flags & MSG_DONTWAIT)
 		return -EAGAIN;



^ permalink raw reply related

* [PATCH] net: fix lock_sock_bh/unlock_sock_bh
From: Eric Dumazet @ 2010-05-27  5:06 UTC (permalink / raw)
  To: David Miller; +Cc: anton, netdev
In-Reply-To: <20100526.212105.212688828.davem@davemloft.net>

Le mercredi 26 mai 2010 à 21:21 -0700, David Miller a écrit :
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Thu, 27 May 2010 06:18:46 +0200
> 
> > Process context :
> > 
> > lock only (sk_lock.slock = locked, sk_lock.owned = ???)
> > 
> > So I should add a test on owned. If set (by another thread), we should take the slow path.
> 
> Indeed, what you're doing now is broken.
> 
> If owned is non-zero when you take the spinlock you have to queue your
> work, just like how we toss packets into the socket backlog, which is
> processed by release_sock(), when this happens.

Here is the patch I am going to test today.

Anton, could you please test it too, just for sure ?

Thanks !


[PATCH] net: fix lock_sock_bh/unlock_sock_bh

This new sock lock primitive was introduced to speedup some user context
socket manipulation. But it is unsafe to protect two threads, one using
regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh

This patch changes lock_sock_bh to be careful against 'owned' state.
If owned is found to be set, we must take the slow path.
lock_sock_bh() now returns a boolean to say if the slow path was taken,
and this boolean is used at unlock_sock_bh time to call the appropriate
unlock function.

After this change, BH are either disabled or enabled during the
lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
so we rename these functions to lock_sock_fast()/unlock_sock_fast().

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 include/net/sock.h  |   20 ++++++++++++++------
 net/core/datagram.c |    6 ++++--
 net/core/sock.c     |   32 ++++++++++++++++++++++++++++++++
 net/ipv4/udp.c      |   14 ++++++++------
 net/ipv6/udp.c      |    5 +++--
 5 files changed, 61 insertions(+), 16 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index d2a71b0..ca241ea 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1026,15 +1026,23 @@ extern void release_sock(struct sock *sk);
 				SINGLE_DEPTH_NESTING)
 #define bh_unlock_sock(__sk)	spin_unlock(&((__sk)->sk_lock.slock))
 
-static inline void lock_sock_bh(struct sock *sk)
+extern bool lock_sock_fast(struct sock *sk);
+/**
+ * unlock_sock_fast - complement of lock_sock_fast
+ * @sk: socket
+ * @slow: slow mode
+ *
+ * fast unlock socket for user context.
+ * If slow mode is on, we call regular release_sock()
+ */
+static inline void unlock_sock_fast(struct sock *sk, bool slow)
 {
-	spin_lock_bh(&sk->sk_lock.slock);
+	if (slow)
+		release_sock(sk);
+	else
+		spin_unlock_bh(&sk->sk_lock.slock);
 }
 
-static inline void unlock_sock_bh(struct sock *sk)
-{
-	spin_unlock_bh(&sk->sk_lock.slock);
-}
 
 extern struct sock		*sk_alloc(struct net *net, int family,
 					  gfp_t priority,
diff --git a/net/core/datagram.c b/net/core/datagram.c
index e009753..f5b6f43 100644
--- a/net/core/datagram.c
+++ b/net/core/datagram.c
@@ -229,15 +229,17 @@ EXPORT_SYMBOL(skb_free_datagram);
 
 void skb_free_datagram_locked(struct sock *sk, struct sk_buff *skb)
 {
+	bool slow;
+
 	if (likely(atomic_read(&skb->users) == 1))
 		smp_rmb();
 	else if (likely(!atomic_dec_and_test(&skb->users)))
 		return;
 
-	lock_sock_bh(sk);
+	slow = lock_sock_fast(sk);
 	skb_orphan(skb);
 	sk_mem_reclaim_partial(sk);
-	unlock_sock_bh(sk);
+	unlock_sock_fast(sk, slow);
 
 	/* skb is now orphaned, can be freed outside of locked section */
 	__kfree_skb(skb);
diff --git a/net/core/sock.c b/net/core/sock.c
index 37fe9b6..7ab6398 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2007,6 +2007,38 @@ void release_sock(struct sock *sk)
 }
 EXPORT_SYMBOL(release_sock);
 
+/**
+ * lock_sock_fast - fast version of lock_sock
+ * @sk: socket
+ *
+ * This version should be used for very small section, where process wont block
+ * return false if fast path is taken
+ *   sk_lock.slock locked, owned = 0, BH disabled
+ * return true if slow path is taken
+ *   sk_lock.slock unlocked, owned = 1, BH enabled
+ */
+bool lock_sock_fast(struct sock *sk)
+{
+	might_sleep();
+	spin_lock_bh(&sk->sk_lock.slock);
+
+	if (!sk->sk_lock.owned)
+		/*
+		 * Note : We must disable BH or risk a deadlock
+		 */
+		return false;
+
+	__lock_sock(sk);
+	sk->sk_lock.owned = 1;
+	spin_unlock(&sk->sk_lock.slock);
+	/*
+	 * The sk_lock has mutex_lock() semantics here:
+	 */
+	mutex_acquire(&sk->sk_lock.dep_map, 0, 0, _RET_IP_);
+	return true;
+}
+EXPORT_SYMBOL(lock_sock_fast);
+
 int sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp)
 {
 	struct timeval tv;
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 9de6a69..b9d0d40 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -1063,10 +1063,11 @@ static unsigned int first_packet_length(struct sock *sk)
 	spin_unlock_bh(&rcvq->lock);
 
 	if (!skb_queue_empty(&list_kill)) {
-		lock_sock_bh(sk);
+		bool slow = lock_sock_fast(sk);
+
 		__skb_queue_purge(&list_kill);
 		sk_mem_reclaim_partial(sk);
-		unlock_sock_bh(sk);
+		unlock_sock_fast(sk, slow);
 	}
 	return res;
 }
@@ -1123,6 +1124,7 @@ int udp_recvmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
 	int peeked;
 	int err;
 	int is_udplite = IS_UDPLITE(sk);
+	bool slow;
 
 	/*
 	 *	Check any passed addresses
@@ -1197,10 +1199,10 @@ out:
 	return err;
 
 csum_copy_err:
-	lock_sock_bh(sk);
+	slow = lock_sock_fast(sk);
 	if (!skb_kill_datagram(sk, skb, flags))
 		UDP_INC_STATS_USER(sock_net(sk), UDP_MIB_INERRORS, is_udplite);
-	unlock_sock_bh(sk);
+	unlock_sock_fast(sk, slow);
 
 	if (noblock)
 		return -EAGAIN;
@@ -1625,9 +1627,9 @@ int udp_rcv(struct sk_buff *skb)
 
 void udp_destroy_sock(struct sock *sk)
 {
-	lock_sock_bh(sk);
+	bool slow = lock_sock_fast(sk);
 	udp_flush_pending_frames(sk);
-	unlock_sock_bh(sk);
+	unlock_sock_fast(sk, slow);
 }
 
 /*
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 3d7a2c0..87be586 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -328,6 +328,7 @@ int udpv6_recvmsg(struct kiocb *iocb, struct sock *sk,
 	int err;
 	int is_udplite = IS_UDPLITE(sk);
 	int is_udp4;
+	bool slow;
 
 	if (addr_len)
 		*addr_len=sizeof(struct sockaddr_in6);
@@ -424,7 +425,7 @@ out:
 	return err;
 
 csum_copy_err:
-	lock_sock_bh(sk);
+	slow = lock_sock_fast(sk);
 	if (!skb_kill_datagram(sk, skb, flags)) {
 		if (is_udp4)
 			UDP_INC_STATS_USER(sock_net(sk),
@@ -433,7 +434,7 @@ csum_copy_err:
 			UDP6_INC_STATS_USER(sock_net(sk),
 					UDP_MIB_INERRORS, is_udplite);
 	}
-	unlock_sock_bh(sk);
+	unlock_sock_fast(sk, slow);
 
 	if (flags & MSG_DONTWAIT)
 		return -EAGAIN;



^ permalink raw reply related

* Re: [PATCH 3/3] net: deliver skbs on inactive slaves to exact matches
From: John Fastabend @ 2010-05-27  4:52 UTC (permalink / raw)
  To: andy@greyhouse.net, fubar@us.ibm.com, nhorman@tuxdriver.com,
	"bonding-devel@lists.sourceforge.net" <bonding-dev
In-Reply-To: <20100513073117.3528.33132.stgit@jf-dev2-dcblab>

Fastabend, John R wrote:
> Currently, the accelerated receive path for VLAN's will
> drop packets if the real device is an inactive slave and
> is not one of the special pkts tested for in
> skb_bond_should_drop().  This behavior is different then
> the non-accelerated path and for pkts over a bonded vlan.
> 
> For example,
> 
> vlanx -> bond0 -> ethx
> 
> will be dropped in the vlan path and not delivered to any
> packet handlers at all.  However,
> 
> bond0 -> vlanx -> ethx
> 
> and
> 
> bond0 -> ethx
> 
> will be delivered to handlers that match the exact dev,
> because the VLAN path checks the real_dev which is not a
> slave and netif_recv_skb() doesn't drop frames but only
> delivers them to exact matches.
> 
> This patch adds a sk_buff flag which is used for tagging
> skbs that would previously been dropped and allows the
> skb to continue to skb_netif_recv().  Here we add
> logic to check for the bond_should_drop flag and if it
> is set only deliver to handlers that match exactly.  This
> makes both paths above consistent and gives pkt handlers
> a way to identify skbs that come from inactive slaves.
> Without this patch in some configurations skbs will be
> delivered to handlers with exact matches and in others
> be dropped out right in the vlan path.
> 
> I have tested the following 4 configurations in failover modes
> and load balancing modes.
> 
> # bond0 -> ethx
> 
> # vlanx -> bond0 -> ethx
> 
> # bond0 -> vlanx -> ethx
> 
> # bond0 -> ethx
>             |
>   vlanx -> --
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
> 

Jay,

I know you were looking over this series at one point.  Did you have any 
comments?  Sorry for the ping just wanted to keep this on my radar.

Thanks,
John

^ permalink raw reply

* Re: Warning in net/ipv4/af_inet.c:154
From: Eric Dumazet @ 2010-05-27  4:21 UTC (permalink / raw)
  To: David Miller; +Cc: anton, netdev
In-Reply-To: <20100526.210600.242135655.davem@davemloft.net>

Le mercredi 26 mai 2010 à 21:06 -0700, David Miller a écrit :
> From: Anton Blanchard <anton@samba.org>
> Date: Thu, 27 May 2010 13:56:17 +1000
> 
> > I'm somewhat confused by the two stage locking in the socket lock
> > (ie sk_lock.slock and sk_lock.owned).
> > 
> > What state should the socket lock be in for serialising updates of
> > sk_forward_alloc? In some cases we appear to update it with sk_lock.slock =
> > unlocked, sk_lock.owned = 1:
> 
> If sh_lock.owned=1 the user has grabbed exclusive sleepable lock on the
> socket, it does this via something approximating:
> 
> retry:
> 	spin_lock(&sk_lock.slock);
> 	was_locked = sk_lock.owned;
> 	if (!was_locked)
> 		sk_lock.owned = 1;
> 	spin_unlock(&sk_lock.slock);
> 	if (was_locked) {
> 		sleep_on(condition(sk_lock.owned));
> 		goto retry;
> 	}
> 
> This allows user context code to sleep with exclusive access to the
> socket.
> 
> Code that cannot sleep takes the spinlock, and queues the work if the
> owner field if non-zero.  Else, it keeps the spinlock held and does
> the work.
> 
> In either case, socket modifications are thus done completely protected
> from other contexts.
> 
> 

Yes, but not on the case one user context uses the 'slow locking', and
another uses the 'fast locking'.

I am working on a patch now.





^ permalink raw reply

* Re: Warning in net/ipv4/af_inet.c:154
From: David Miller @ 2010-05-27  4:21 UTC (permalink / raw)
  To: eric.dumazet; +Cc: anton, netdev
In-Reply-To: <1274933926.2542.26.camel@edumazet-laptop>

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Thu, 27 May 2010 06:18:46 +0200

> Process context :
> 
> lock only (sk_lock.slock = locked, sk_lock.owned = ???)
> 
> So I should add a test on owned. If set (by another thread), we should take the slow path.

Indeed, what you're doing now is broken.

If owned is non-zero when you take the spinlock you have to queue your
work, just like how we toss packets into the socket backlog, which is
processed by release_sock(), when this happens.

^ permalink raw reply

* Re: Warning in net/ipv4/af_inet.c:154
From: Eric Dumazet @ 2010-05-27  4:18 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: David Miller, netdev
In-Reply-To: <20100527035617.GB28295@kryten>

Le jeudi 27 mai 2010 à 13:56 +1000, Anton Blanchard a écrit :
> Hi Eric,
> 
> > You are 100% right David, maybe we should add a test when changing
> > sk_forward_alloc to test if socket is locked (lockdep only test), but
> > that's for 2.6.36 :)
> 
> Thanks for the patch, unfortunately I can still hit the WARN_ON. I'm somewhat
> confused by the two stage locking in the socket lock (ie sk_lock.slock and
> sk_lock.owned).
> 
> What state should the socket lock be in for serialising updates of
> sk_forward_alloc? In some cases we appear to update it with sk_lock.slock =
> unlocked, sk_lock.owned = 1:
> 
> NIP [c0000000005b4ad0] .sock_queue_rcv_skb
> LR [c0000000005b4acc] .sock_queue_rcv_skb
> Call Trace:
> [c0000000005f9fcc] .ip_queue_rcv_skb
> [c00000000061d604] .__udp_queue_rcv_skb
> [c0000000005b1a38] .release_sock
> [c0000000006205f0] .udp_sendmsg
> [c0000000006290d4] .inet_sendmsg
> [c0000000005abfb4] .sock_sendmsg
> [c0000000005ae9dc] .SyS_sendto
> [c0000000005ab6c0] .SyS_send
> 
> And other times we update it with sk_lock.slock = locked, sk_lock.owned = 0:
> 
> NIP [c0000000005b2b6c] .sock_rfree
> LR [c0000000005b2b68] .sock_rfree
> Call Trace:
> [c0000000005bca10] .skb_free_datagram_locked
> [c00000000061fe88] .udp_recvmsg
> [c0000000006285e8] .inet_recvmsg
> [c0000000005abe0c] .sock_recvmsg
> [c0000000005ae358] .SyS_recvfrom
> 
> I see we sometimes take sk_lock.slock then check the owned field, but we
> aren't doing that all the time.
> 

Old rule was :

A Process context was using 
lock + test_and_set_or_sleep + unlock (sk_lock.slock = unlocked,
sk_lock.owned = 1)

softirq handler was using :
(sk_lock.slock = locked, sk_lock.owned =0)

I added a shortcut, but it seems wrong as is

Process context :

lock only (sk_lock.slock = locked, sk_lock.owned = ???)

So I should add a test on owned. If set (by another thread), we should take the slow path.

Thanks !



^ permalink raw reply

* Re: [PATCH 1/2] net: ll_temac: fix interrupt bug when interrupt 0 is used
From: John Williams @ 2010-05-27  4:12 UTC (permalink / raw)
  To: John Linn
  Cc: netdev, linuxppc-dev, grant.likely, jwboyer, michal.simek,
	Brian Hill
In-Reply-To: <6f264a7a-e5da-494a-a24d-1578ca422807@VA3EHSMHS016.ehs.local>

On Thu, May 27, 2010 at 3:29 AM, John Linn <john.linn@xilinx.com> wrote:
> The code is not checking the interrupt for DMA correctly so that an
> interrupt number of 0 will cause a false error.
>
> Signed-off-by: Brian Hill <brian.hill@xilinx.com>
> Signed-off-by: John Linn <john.linn@xilinx.com>
> ---
>  drivers/net/ll_temac_main.c |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/ll_temac_main.c b/drivers/net/ll_temac_main.c
> index fa7620e..0615737 100644
> --- a/drivers/net/ll_temac_main.c
> +++ b/drivers/net/ll_temac_main.c
> @@ -950,7 +950,7 @@ temac_of_probe(struct of_device *op, const struct of_device_id *match)
>
>        lp->rx_irq = irq_of_parse_and_map(np, 0);
>        lp->tx_irq = irq_of_parse_and_map(np, 1);
> -       if (!lp->rx_irq || !lp->tx_irq) {
> +       if ((lp->rx_irq == NO_IRQ) || (lp->tx_irq == NO_IRQ)) {

Personally I think this is the right thing to do.  But, I thought the
IRQ 0 == NO_IRQ (AKA "all-the-world's-an-x86-and-if-not-it-should-be")
holy war was already fought and won (or lost, depending on your
perspective)?

I seem to recall giving reluctant assent to a patch from Grant a few
months ago that touched MicroBlaze thus?

John

^ permalink raw reply

* Re: [PATCH 9/17] net/iucv: Add missing spin_unlock
From: David Miller @ 2010-05-27  4:10 UTC (permalink / raw)
  To: julia
  Cc: ursula.braun, linux390, linux-s390, netdev, linux-kernel,
	kernel-janitors
In-Reply-To: <Pine.LNX.4.64.1005261756330.23743@ask.diku.dk>

From: Julia Lawall <julia@diku.dk>
Date: Wed, 26 May 2010 17:56:48 +0200 (CEST)

> Add a spin_unlock missing on the error path.  There seems like no reason
> why the lock should continue to be held if the kzalloc fail.

Applied, thanks.

^ permalink raw reply

* Re: Warning in net/ipv4/af_inet.c:154
From: David Miller @ 2010-05-27  4:06 UTC (permalink / raw)
  To: anton; +Cc: eric.dumazet, netdev
In-Reply-To: <20100527035617.GB28295@kryten>

From: Anton Blanchard <anton@samba.org>
Date: Thu, 27 May 2010 13:56:17 +1000

> I'm somewhat confused by the two stage locking in the socket lock
> (ie sk_lock.slock and sk_lock.owned).
> 
> What state should the socket lock be in for serialising updates of
> sk_forward_alloc? In some cases we appear to update it with sk_lock.slock =
> unlocked, sk_lock.owned = 1:

If sh_lock.owned=1 the user has grabbed exclusive sleepable lock on the
socket, it does this via something approximating:

retry:
	spin_lock(&sk_lock.slock);
	was_locked = sk_lock.owned;
	if (!was_locked)
		sk_lock.owned = 1;
	spin_unlock(&sk_lock.slock);
	if (was_locked) {
		sleep_on(condition(sk_lock.owned));
		goto retry;
	}

This allows user context code to sleep with exclusive access to the
socket.

Code that cannot sleep takes the spinlock, and queues the work if the
owner field if non-zero.  Else, it keeps the spinlock held and does
the work.

In either case, socket modifications are thus done completely protected
from other contexts.

^ permalink raw reply

* Re: Warning in net/ipv4/af_inet.c:154
From: Anton Blanchard @ 2010-05-27  3:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Miller, netdev
In-Reply-To: <1274868776.2672.96.camel@edumazet-laptop>

Hi Eric,

> You are 100% right David, maybe we should add a test when changing
> sk_forward_alloc to test if socket is locked (lockdep only test), but
> that's for 2.6.36 :)

Thanks for the patch, unfortunately I can still hit the WARN_ON. I'm somewhat
confused by the two stage locking in the socket lock (ie sk_lock.slock and
sk_lock.owned).

What state should the socket lock be in for serialising updates of
sk_forward_alloc? In some cases we appear to update it with sk_lock.slock =
unlocked, sk_lock.owned = 1:

NIP [c0000000005b4ad0] .sock_queue_rcv_skb
LR [c0000000005b4acc] .sock_queue_rcv_skb
Call Trace:
[c0000000005f9fcc] .ip_queue_rcv_skb
[c00000000061d604] .__udp_queue_rcv_skb
[c0000000005b1a38] .release_sock
[c0000000006205f0] .udp_sendmsg
[c0000000006290d4] .inet_sendmsg
[c0000000005abfb4] .sock_sendmsg
[c0000000005ae9dc] .SyS_sendto
[c0000000005ab6c0] .SyS_send

And other times we update it with sk_lock.slock = locked, sk_lock.owned = 0:

NIP [c0000000005b2b6c] .sock_rfree
LR [c0000000005b2b68] .sock_rfree
Call Trace:
[c0000000005bca10] .skb_free_datagram_locked
[c00000000061fe88] .udp_recvmsg
[c0000000006285e8] .inet_recvmsg
[c0000000005abe0c] .sock_recvmsg
[c0000000005ae358] .SyS_recvfrom

I see we sometimes take sk_lock.slock then check the owned field, but we
aren't doing that all the time.

Anton

^ permalink raw reply

* Re: [PATCH 2/2] net: ll_temac: fix checksum offload logic
From: David Miller @ 2010-05-27  3:45 UTC (permalink / raw)
  To: john.linn
  Cc: netdev, linuxppc-dev, grant.likely, jwboyer, john.williams,
	michal.simek, brian.hill
In-Reply-To: <0690e798-1e5e-4416-90dc-04a9df16600f@SG2EHSMHS005.ehs.local>

aFrom: John Linn <john.linn@xilinx.com>
Date: Wed, 26 May 2010 11:29:19 -0600

> The current checksum offload code does not work and this corrects
> that functionality. It also updates the interrupt coallescing
> initialization so than there are fewer interrupts and performance
> is increased.
> 
> Signed-off-by: Brian Hill <brian.hill@xilinx.com>
> Signed-off-by: John Linn <john.linn@xilinx.com>

Applied, although I changed "temac_features" to be explicitly
"unsigned int" instead of just plain "unsigned"


^ permalink raw reply

* Re: [PATCH 1/2] net: ll_temac: fix interrupt bug when interrupt 0 is used
From: David Miller @ 2010-05-27  3:44 UTC (permalink / raw)
  To: john.linn
  Cc: netdev, linuxppc-dev, grant.likely, jwboyer, john.williams,
	michal.simek, brian.hill
In-Reply-To: <6f264a7a-e5da-494a-a24d-1578ca422807@VA3EHSMHS016.ehs.local>

From: John Linn <john.linn@xilinx.com>
Date: Wed, 26 May 2010 11:29:18 -0600

> The code is not checking the interrupt for DMA correctly so that an
> interrupt number of 0 will cause a false error.
> 
> Signed-off-by: Brian Hill <brian.hill@xilinx.com>
> Signed-off-by: John Linn <john.linn@xilinx.com>

Applied.

^ permalink raw reply

* Re: [PATCH] tcp: Socket option to set congestion window
From: David Miller @ 2010-05-27  3:04 UTC (permalink / raw)
  To: hagen; +Cc: andi, therbert, shemminger, netdev, ycheng
In-Reply-To: <20100526231512.GD2684@nuttenaction>

From: Hagen Paul Pfeifer <hagen@jauu.net>
Date: Thu, 27 May 2010 01:15:12 +0200

> How can a domain defined as {process,peer-IP} fair to the 1MB
> bottleneck link?

You're asking about a network level issue in terms of what can be done
on a local end-node.

All an end-node can do is abide by congestion control rules and respond
to packet drops, as has been going on for decades.

People have basically (especially in Europe) given up on crazy crap
like RSVP and other forms of bandwidth limiting and reservation.  They
just oversubscribe their links, and increase their capacity as traffic
increases dictate.  It just isn't all that manageable to put people's
traffic into classes and control what they do on a large scale.

I'm also skeptical about those who say the fight belongs squarely at
the end nodes.  If you want to control the network traffic of the
meeting point of your dumbbell, you'll need a machine there doing RED
or traffic limiting.  End-host schemes simply aren't going to work
because I can just add more end-hosts to reintroduce the problem.

The dumbbell situation is independant of the end-node issues, that's
all I'm really saying.

^ permalink raw reply

* UDP Fragmentation and DF bit..
From: Vincent, Pradeep @ 2010-05-27  1:45 UTC (permalink / raw)
  To: netdev@vger.kernel.org

After running into issues with UDP in multi-MTU network environment, I
started digging through the code and found somewhat inconsistent behavior
(if I read the code right)

OMan 7 ip¹ declares that ³The  system-wide  default  is  controlled  by  the
ip_no_pmtu_disc  sysctl  for SOCK_STREAM  sockets,  and  disabled  on all
others.² which led me to think ODF¹ bit will not be set for UDP packets.
But..

The code in ip_output.c seems to do the following,

1. If packet size <= current PMTU, then set the DF=1 and send the packet
out.
2. If packet size > current PMTU, then set DF=0 and send the packet out
after fragmentation.

In a network environment where MTU-big and MTU-small co-exist (and have
router¹s fragmentation turned off in favor of PMTU discovery), UDP packets
that are > MTU-small and < MTU-big find the PMTU effectively but UDP packets
that are > MTU-big get dropped. This looks like inconsistent behavior to me
and doesn¹t seem to match the advertised behavior.

Is there a reason why PMTU support for UDP is somewhat inconsistent ?

Is there a reason ODF¹ bit cannot be set on fragmented packets on UDP
transmission ? I couldn¹t find anything in RFC for IP protocol that
prohibited DF bit on fragmented packets. Did I miss
something here ?

Would it be reasonable if PMTU discovery is performed (DF bit set +
appropriate icmp logic) even for locally fragmented packets ? I think this
will be a great help for UDP users that don¹t have PMTU handling logic in
the application (most udp applications belong to this category in my
experience). Thoughts ?

Thanks,

- Pradeep Vincent

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox