Netdev List

Netdev List
 help / color / mirror / Atom feed

* Re: [PATCH] sctp: Add RCU protection to assoc->transport_addr_list
From: Vlad Yasevich @ 2012-12-06 19:14 UTC (permalink / raw)
  To: Thomas Graf; +Cc: linux-sctp, netdev, Neil Horman
In-Reply-To: <20121206190835.GF16122@casper.infradead.org>

On 12/06/2012 02:08 PM, Thomas Graf wrote:
> On 12/06/12 at 01:57pm, Vlad Yasevich wrote:
>> On 12/06/2012 01:44 PM, Thomas Graf wrote:
>>> On 12/06/12 at 01:35pm, Vlad Yasevich wrote:
>>>> We may want to mark transports as dead sooner.  Probably right about
>>>> the time we pull them off the list.
>>>
>>> We mark it dead in sctp_transport_free() which is called at the
>>> end of sctp_assoc_rm_peer(). Do you want to mark it dead at the
>>> beginning of sctp_assoc_rm_peer() as well? (We still need to
>>> mark in sctp_transport_free() anyway).
>>
>> Crud..  sctp_transport_free() is called directly in places...   Hmm...
>> the one in sctp_association_free() may need to be list_del_rcu()...
>
> It's not really needed but it wouldn't be wrong from a
> documentation perspective. The assoc is always unhashed
> while holding head->lock before sctp_association_free()
> and all current RCU readers of transport_addr_list access
> the the assoc while holding a read-lock on head->lock.
>
> Let me respin this patch and do a list_del_rcu() there
> to document the RCU'iness of it.
>

Right, but there may be chunks that have cached association with a ref
before sctp_association_free() is called.  Now, after free they may
be looking at the transport list for whatever reason...  Most places
check assoc->dead, but I don't want to get caught.  So, there is
a remote chance that someone may look at transports and would crash
without rcu.

-vlad

^ permalink raw reply

* Re: [PATCH] sctp: Add RCU protection to assoc->transport_addr_list
From: Thomas Graf @ 2012-12-06 19:08 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: linux-sctp, netdev, Neil Horman
In-Reply-To: <50C0EAB5.3050303@gmail.com>

On 12/06/12 at 01:57pm, Vlad Yasevich wrote:
> On 12/06/2012 01:44 PM, Thomas Graf wrote:
> >On 12/06/12 at 01:35pm, Vlad Yasevich wrote:
> >>We may want to mark transports as dead sooner.  Probably right about
> >>the time we pull them off the list.
> >
> >We mark it dead in sctp_transport_free() which is called at the
> >end of sctp_assoc_rm_peer(). Do you want to mark it dead at the
> >beginning of sctp_assoc_rm_peer() as well? (We still need to
> >mark in sctp_transport_free() anyway).
> 
> Crud..  sctp_transport_free() is called directly in places...   Hmm...
> the one in sctp_association_free() may need to be list_del_rcu()...

It's not really needed but it wouldn't be wrong from a
documentation perspective. The assoc is always unhashed
while holding head->lock before sctp_association_free()
and all current RCU readers of transport_addr_list access
the the assoc while holding a read-lock on head->lock.

Let me respin this patch and do a list_del_rcu() there
to document the RCU'iness of it.

^ permalink raw reply

* Re: [PATCH] sctp: Add RCU protection to assoc->transport_addr_list
From: Vlad Yasevich @ 2012-12-06 18:57 UTC (permalink / raw)
  To: Thomas Graf; +Cc: linux-sctp, netdev, Neil Horman
In-Reply-To: <20121206184433.GE16122@casper.infradead.org>

On 12/06/2012 01:44 PM, Thomas Graf wrote:
> On 12/06/12 at 01:35pm, Vlad Yasevich wrote:
>> We may want to mark transports as dead sooner.  Probably right about
>> the time we pull them off the list.
>
> We mark it dead in sctp_transport_free() which is called at the
> end of sctp_assoc_rm_peer(). Do you want to mark it dead at the
> beginning of sctp_assoc_rm_peer() as well? (We still need to
> mark in sctp_transport_free() anyway).

Crud..  sctp_transport_free() is called directly in places...   Hmm...
the one in sctp_association_free() may need to be list_del_rcu()...

Ok, we can leave the dead handling the way it is..

-vlad

>
>> When displaying, we may want to
>> look at transport->dead, and skip them.  It will reduce the probability
>> that we would be looking at a transport that's about to go away.
>
> Agreed.
>

^ permalink raw reply

* Re: [PATCH] net : enable tx time stamping in the vde driver.
From: Richard Cochran @ 2012-12-06 18:57 UTC (permalink / raw)
  To: Paul Chavent; +Cc: jdike, richard, user-mode-linux-devel, netdev
In-Reply-To: <1354807505-21222-1-git-send-email-paul.chavent@onera.fr>

On Thu, Dec 06, 2012 at 04:25:05PM +0100, Paul Chavent wrote:
> This new version moves the skb_tx_timestamp in the main uml
> driver. This should avoid the need to call this function in each
> transport (vde, slirp, tuntap, ...). It also add support for ethtool
> get_ts_info.
> 
> Signed-off-by: Paul Chavent <paul.chavent@onera.fr>

Acked-by: Richard Cochran <richardcochran@gmail.com>

^ permalink raw reply

* Re: [PATCH net next] team: remove team parameter from port_enter callback.
From: Jiri Pirko @ 2012-12-06 18:50 UTC (permalink / raw)
  To: Rami Rosen; +Cc: davem, netdev
In-Reply-To: <1354799382-23808-1-git-send-email-ramirose@gmail.com>

Thu, Dec 06, 2012 at 02:09:42PM CET, ramirose@gmail.com wrote:
>This patch removes an unused parameter (team) from port_enter callback in 
>team_mode_ops and fixes accordingly its invocations in 3 modes and in team.c.

Please do not do this. This parameter is ment to be used in future.

Thanks.

Jiri

>
>Signed-off-by: Rami Rosen <ramirose@gmail.com>
>---
> drivers/net/team/team.c                  | 2 +-
> drivers/net/team/team_mode_broadcast.c   | 2 +-
> drivers/net/team/team_mode_loadbalance.c | 2 +-
> drivers/net/team/team_mode_roundrobin.c  | 2 +-
> include/linux/if_team.h                  | 2 +-
> 5 files changed, 5 insertions(+), 5 deletions(-)
>
>diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>index ad86660..4203808 100644
>--- a/drivers/net/team/team.c
>+++ b/drivers/net/team/team.c
>@@ -887,7 +887,7 @@ static int team_port_enter(struct team *team, struct team_port *port)
> 	dev_hold(team->dev);
> 	port->dev->priv_flags |= IFF_TEAM_PORT;
> 	if (team->ops.port_enter) {
>-		err = team->ops.port_enter(team, port);
>+		err = team->ops.port_enter(port);
> 		if (err) {
> 			netdev_err(team->dev, "Device %s failed to enter team mode\n",
> 				   port->dev->name);
>diff --git a/drivers/net/team/team_mode_broadcast.c b/drivers/net/team/team_mode_broadcast.c
>index c5db428..c3840bc 100644
>--- a/drivers/net/team/team_mode_broadcast.c
>+++ b/drivers/net/team/team_mode_broadcast.c
>@@ -46,7 +46,7 @@ static bool bc_transmit(struct team *team, struct sk_buff *skb)
> 	return sum_ret;
> }
> 
>-static int bc_port_enter(struct team *team, struct team_port *port)
>+static int bc_port_enter(struct team_port *port)
> {
> 	return team_port_set_team_dev_addr(port);
> }
>diff --git a/drivers/net/team/team_mode_loadbalance.c b/drivers/net/team/team_mode_loadbalance.c
>index cdc31b5..1e84c77 100644
>--- a/drivers/net/team/team_mode_loadbalance.c
>+++ b/drivers/net/team/team_mode_loadbalance.c
>@@ -614,7 +614,7 @@ static void lb_exit(struct team *team)
> 	kfree(lb_priv->ex);
> }
> 
>-static int lb_port_enter(struct team *team, struct team_port *port)
>+static int lb_port_enter(struct team_port *port)
> {
> 	struct lb_port_priv *lb_port_priv = get_lb_port_priv(port);
> 
>diff --git a/drivers/net/team/team_mode_roundrobin.c b/drivers/net/team/team_mode_roundrobin.c
>index 105135a..abe889e 100644
>--- a/drivers/net/team/team_mode_roundrobin.c
>+++ b/drivers/net/team/team_mode_roundrobin.c
>@@ -64,7 +64,7 @@ drop:
> 	return false;
> }
> 
>-static int rr_port_enter(struct team *team, struct team_port *port)
>+static int rr_port_enter(struct team_port *port)
> {
> 	return team_port_set_team_dev_addr(port);
> }
>diff --git a/include/linux/if_team.h b/include/linux/if_team.h
>index 0245def..3366453 100644
>--- a/include/linux/if_team.h
>+++ b/include/linux/if_team.h
>@@ -105,7 +105,7 @@ struct team_mode_ops {
> 				       struct team_port *port,
> 				       struct sk_buff *skb);
> 	bool (*transmit)(struct team *team, struct sk_buff *skb);
>-	int (*port_enter)(struct team *team, struct team_port *port);
>+	int (*port_enter)(struct team_port *port);
> 	void (*port_leave)(struct team *team, struct team_port *port);
> 	void (*port_change_dev_addr)(struct team *team, struct team_port *port);
> 	void (*port_enabled)(struct team *team, struct team_port *port);
>-- 
>1.7.11.7
>

^ permalink raw reply

* [PATCH] tcp: bug fix Fast Open client retransmission
From: Yuchung Cheng @ 2012-12-06 18:45 UTC (permalink / raw)
  To: davem, ncardwell, nanditad, edumazet; +Cc: netdev, Yuchung Cheng

If SYN-ACK partially acks SYN-data, the client retransmits the
remaining data by tcp_retransmit_skb(). This increments lost recovery
state variables like tp->retrans_out in Open state. If loss recovery
happens before the retransmission is acked, it triggers the WARN_ON
check in tcp_fastretrans_alert(). For example: the client sends
SYN-data, gets SYN-ACK acking only ISN, retransmits data, sends
another 4 data packets and get 3 dupacks.

Since the retransmission is not caused by network drop it should not
update the recovery state variables. Further the server may return a
smaller MSS than the cached MSS used for SYN-data, so the retranmission
needs a loop. Otherwise some data will not be retransmitted until timeout
or other loss recovery events.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
---
 include/net/tcp.h     |    1 +
 net/ipv4/tcp_input.c  |    6 +++++-
 net/ipv4/tcp_output.c |   15 ++++++++++-----
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 3202bde..aed42c7 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -524,6 +524,7 @@ static inline __u32 cookie_v6_init_sequence(struct sock *sk,
 extern void __tcp_push_pending_frames(struct sock *sk, unsigned int cur_mss,
 				      int nonagle);
 extern bool tcp_may_send_now(struct sock *sk);
+extern int __tcp_retransmit_skb(struct sock *, struct sk_buff *);
 extern int tcp_retransmit_skb(struct sock *, struct sk_buff *);
 extern void tcp_retransmit_timer(struct sock *sk);
 extern void tcp_xmit_retransmit_queue(struct sock *);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index fc67831..a136925 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5652,7 +5652,11 @@ static bool tcp_rcv_fastopen_synack(struct sock *sk, struct sk_buff *synack,
 	tcp_fastopen_cache_set(sk, mss, cookie, syn_drop);
 
 	if (data) { /* Retransmit unacked data in SYN */
-		tcp_retransmit_skb(sk, data);
+		tcp_for_write_queue_from(data, sk) {
+			if (data == tcp_send_head(sk) ||
+			    __tcp_retransmit_skb(sk, data))
+				break;
+		}
 		tcp_rearm_rto(sk);
 		return true;
 	}
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 8ac0855..5d45159 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2309,12 +2309,11 @@ static void tcp_retrans_try_collapse(struct sock *sk, struct sk_buff *to,
  * state updates are done by the caller.  Returns non-zero if an
  * error occurred which prevented the send.
  */
-int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
+int __tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	struct inet_connection_sock *icsk = inet_csk(sk);
 	unsigned int cur_mss;
-	int err;
 
 	/* Inconslusive MTU probe */
 	if (icsk->icsk_mtup.probe_size) {
@@ -2387,11 +2386,17 @@ int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
 	if (unlikely(NET_IP_ALIGN && ((unsigned long)skb->data & 3))) {
 		struct sk_buff *nskb = __pskb_copy(skb, MAX_TCP_HEADER,
 						   GFP_ATOMIC);
-		err = nskb ? tcp_transmit_skb(sk, nskb, 0, GFP_ATOMIC) :
-			     -ENOBUFS;
+		return nskb ? tcp_transmit_skb(sk, nskb, 0, GFP_ATOMIC) :
+			      -ENOBUFS;
 	} else {
-		err = tcp_transmit_skb(sk, skb, 1, GFP_ATOMIC);
+		return tcp_transmit_skb(sk, skb, 1, GFP_ATOMIC);
 	}
+}
+
+int tcp_retransmit_skb(struct sock *sk, struct sk_buff *skb)
+{
+	struct tcp_sock *tp = tcp_sk(sk);
+	int err = __tcp_retransmit_skb(sk, skb);
 
 	if (err == 0) {
 		/* Update global TCP statistics. */
-- 
1.7.7.3

^ permalink raw reply related

* Re: [PATCH] sctp: Add RCU protection to assoc->transport_addr_list
From: Thomas Graf @ 2012-12-06 18:44 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: linux-sctp, netdev, Neil Horman
In-Reply-To: <50C0E585.1080701@gmail.com>

On 12/06/12 at 01:35pm, Vlad Yasevich wrote:
> We may want to mark transports as dead sooner.  Probably right about
> the time we pull them off the list.

We mark it dead in sctp_transport_free() which is called at the
end of sctp_assoc_rm_peer(). Do you want to mark it dead at the
beginning of sctp_assoc_rm_peer() as well? (We still need to
mark in sctp_transport_free() anyway).

> When displaying, we may want to
> look at transport->dead, and skip them.  It will reduce the probability
> that we would be looking at a transport that's about to go away.

Agreed.

^ permalink raw reply

* Re: [PATCH] sctp: Add RCU protection to assoc->transport_addr_list
From: Vlad Yasevich @ 2012-12-06 18:35 UTC (permalink / raw)
  To: Thomas Graf; +Cc: linux-sctp, netdev, Neil Horman
In-Reply-To: <16453bea94a6fc43d657139dff2ce0b5924e2a1f.1354817574.git.tgraf@suug.ch>

On 12/06/2012 01:15 PM, Thomas Graf wrote:
> peer.transport_addr_list is currently only protected by sk_sock
> which is inpractical to acquire for procfs dumping purposes.
>
> This patch adds RCU protection allowing for the procfs readers to
> enter RCU read-side critical sections.
>
> Modification of the list continues to be serialized via sk_lock.
>
> Cc: Vlad Yasevich <vyasevich@gmail.com>
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> ---
>   include/net/sctp/structs.h |  2 ++
>   net/sctp/associola.c       |  4 ++--
>   net/sctp/proc.c            |  8 ++++++--
>   net/sctp/transport.c       | 18 +++++++++++++-----
>   4 files changed, 23 insertions(+), 9 deletions(-)
>
> diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
> index 64158aa..5d6987b 100644
> --- a/include/net/sctp/structs.h
> +++ b/include/net/sctp/structs.h
> @@ -948,6 +948,8 @@ struct sctp_transport {
>
>   	/* 64-bit random number sent with heartbeat. */
>   	__u64 hb_nonce;
> +
> +	struct rcu_head rcu;
>   };
>
>   struct sctp_transport *sctp_transport_new(struct net *, const union sctp_addr *,
> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> index b1ef3bc..c826bb8 100644
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -565,7 +565,7 @@ void sctp_assoc_rm_peer(struct sctp_association *asoc,
>   		sctp_assoc_update_retran_path(asoc);
>
>   	/* Remove this peer from the list. */
> -	list_del(&peer->transports);
> +	list_del_rcu(&peer->transports);
>
>   	/* Get the first transport of asoc. */
>   	pos = asoc->peer.transport_addr_list.next;
> @@ -765,7 +765,7 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc,
>   	peer->state = peer_state;
>
>   	/* Attach the remote transport to our asoc.  */
> -	list_add_tail(&peer->transports, &asoc->peer.transport_addr_list);
> +	list_add_tail_rcu(&peer->transports, &asoc->peer.transport_addr_list);
>   	asoc->peer.transport_count++;
>
>   	/* If we do not yet have a primary path, set one.  */
> diff --git a/net/sctp/proc.c b/net/sctp/proc.c
> index ec9b0c8..36a3f9d 100644
> --- a/net/sctp/proc.c
> +++ b/net/sctp/proc.c
> @@ -159,7 +159,8 @@ static void sctp_seq_dump_remote_addrs(struct seq_file *seq, struct sctp_associa
>   	struct sctp_af *af;
>
>   	primary = &assoc->peer.primary_addr;
> -	list_for_each_entry(transport, &assoc->peer.transport_addr_list,
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(transport, &assoc->peer.transport_addr_list,
>   			transports) {
>   		addr = &transport->ipaddr;
>   		af = sctp_get_af_specific(addr->sa.sa_family);
> @@ -168,6 +169,7 @@ static void sctp_seq_dump_remote_addrs(struct seq_file *seq, struct sctp_associa
>   		}
>   		af->seq_dump_addr(seq, addr);
>   	}
> +	rcu_read_unlock();
>   }
>
>   static void * sctp_eps_seq_start(struct seq_file *seq, loff_t *pos)
> @@ -438,11 +440,12 @@ static int sctp_remaddr_seq_show(struct seq_file *seq, void *v)
>   	head = &sctp_assoc_hashtable[hash];
>   	sctp_local_bh_disable();
>   	read_lock(&head->lock);
> +	rcu_read_lock();
>   	sctp_for_each_hentry(epb, node, &head->chain) {
>   		if (!net_eq(sock_net(epb->sk), seq_file_net(seq)))
>   			continue;
>   		assoc = sctp_assoc(epb);
> -		list_for_each_entry(tsp, &assoc->peer.transport_addr_list,
> +		list_for_each_entry_rcu(tsp, &assoc->peer.transport_addr_list,
>   					transports) {
>   			/*
>   			 * The remote address (ADDR)
> @@ -489,6 +492,7 @@ static int sctp_remaddr_seq_show(struct seq_file *seq, void *v)
>   		}
>   	}
>
> +	rcu_read_unlock();
>   	read_unlock(&head->lock);
>   	sctp_local_bh_enable();
>
> diff --git a/net/sctp/transport.c b/net/sctp/transport.c
> index 206cf52..1295aec 100644
> --- a/net/sctp/transport.c
> +++ b/net/sctp/transport.c
> @@ -163,13 +163,11 @@ void sctp_transport_free(struct sctp_transport *transport)
>   	sctp_transport_put(transport);
>   }
>
> -/* Destroy the transport data structure.
> - * Assumes there are no more users of this structure.
> - */
> -static void sctp_transport_destroy(struct sctp_transport *transport)
> +static void sctp_transport_destroy_rcu(struct rcu_head *head)
>   {
> -	SCTP_ASSERT(transport->dead, "Transport is not dead", return);
> +	struct sctp_transport *transport;
>
> +	transport = container_of(head, struct sctp_transport, rcu);
>   	if (transport->asoc)
>   		sctp_association_put(transport->asoc);
>
> @@ -180,6 +178,16 @@ static void sctp_transport_destroy(struct sctp_transport *transport)
>   	SCTP_DBG_OBJCNT_DEC(transport);
>   }
>
> +/* Destroy the transport data structure.
> + * Assumes there are no more users of this structure.
> + */
> +static void sctp_transport_destroy(struct sctp_transport *transport)
> +{
> +	SCTP_ASSERT(transport->dead, "Transport is not dead", return);
> +
> +	call_rcu(&transport->rcu, sctp_transport_destroy_rcu);
> +}
> +
>   /* Start T3_rtx timer if it is not already running and update the heartbeat
>    * timer.  This routine is called every time a DATA chunk is sent.
>    */
>

We may want to mark transports as dead sooner.  Probably right about the 
time we pull them off the list.  When displaying, we may want to
look at transport->dead, and skip them.  It will reduce the probability
that we would be looking at a transport that's about to go away.

Otherwise, looks very nice.

-vlad

^ permalink raw reply

* Re: [PATCH] sctp: proc: protect bind_addr->address_list accesses with rcu_read_lock()
From: Thomas Graf @ 2012-12-06 18:28 UTC (permalink / raw)
  To: Vlad Yasevich; +Cc: linux-sctp, netdev, Neil Horman
In-Reply-To: <50C0E300.3000703@gmail.com>

On 12/06/12 at 01:25pm, Vlad Yasevich wrote:
> May want to avoid printing addresses that are !addr->valid.
> 
> Otherwise looks good.

Good point, I'll send a follow-up patch if that is OK. Same
applies to dead transports obviously.

^ permalink raw reply

* Re: [PATCH] sctp: proc: protect bind_addr->address_list accesses with rcu_read_lock()
From: Vlad Yasevich @ 2012-12-06 18:25 UTC (permalink / raw)
  To: Thomas Graf; +Cc: linux-sctp, netdev, Neil Horman
In-Reply-To: <a803093c70e857e8e01e554288ee79f66f7acda4.1354814846.git.tgraf@suug.ch>

On 12/06/2012 12:34 PM, Thomas Graf wrote:
> address_list is protected via the socket lock or RCU. Since we don't want
> to take the socket lock for each assoc we dump in procfs a RCU read-side
> critical section must be entered.
>
> Cc: Vlad Yasevich <vyasevich@gmail.com>
> Cc: Neil Horman <nhorman@tuxdriver.com>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> ---
>   net/sctp/proc.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/net/sctp/proc.c b/net/sctp/proc.c
> index 9966e7b..ec9b0c8 100644
> --- a/net/sctp/proc.c
> +++ b/net/sctp/proc.c
> @@ -139,7 +139,8 @@ static void sctp_seq_dump_local_addrs(struct seq_file *seq, struct sctp_ep_commo
>   	    primary = &peer->saddr;
>   	}
>
> -	list_for_each_entry(laddr, &epb->bind_addr.address_list, list) {
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(laddr, &epb->bind_addr.address_list, list) {
>   		addr = &laddr->a;
>   		af = sctp_get_af_specific(addr->sa.sa_family);
>   		if (primary && af->cmp_addr(addr, primary)) {
> @@ -147,6 +148,7 @@ static void sctp_seq_dump_local_addrs(struct seq_file *seq, struct sctp_ep_commo
>   		}
>   		af->seq_dump_addr(seq, addr);
>   	}
> +	rcu_read_unlock();
>   }
>
>   /* Dump remote addresses of an association. */
>

May want to avoid printing addresses that are !addr->valid.

Otherwise looks good.

-vlad

^ permalink raw reply

* [PATCH] sctp: Add RCU protection to assoc->transport_addr_list
From: Thomas Graf @ 2012-12-06 18:15 UTC (permalink / raw)
  To: linux-sctp; +Cc: netdev, Vlad Yasevich, Neil Horman

peer.transport_addr_list is currently only protected by sk_sock
which is inpractical to acquire for procfs dumping purposes.

This patch adds RCU protection allowing for the procfs readers to
enter RCU read-side critical sections.

Modification of the list continues to be serialized via sk_lock.

Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 include/net/sctp/structs.h |  2 ++
 net/sctp/associola.c       |  4 ++--
 net/sctp/proc.c            |  8 ++++++--
 net/sctp/transport.c       | 18 +++++++++++++-----
 4 files changed, 23 insertions(+), 9 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 64158aa..5d6987b 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -948,6 +948,8 @@ struct sctp_transport {
 
 	/* 64-bit random number sent with heartbeat. */
 	__u64 hb_nonce;
+
+	struct rcu_head rcu;
 };
 
 struct sctp_transport *sctp_transport_new(struct net *, const union sctp_addr *,
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index b1ef3bc..c826bb8 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -565,7 +565,7 @@ void sctp_assoc_rm_peer(struct sctp_association *asoc,
 		sctp_assoc_update_retran_path(asoc);
 
 	/* Remove this peer from the list. */
-	list_del(&peer->transports);
+	list_del_rcu(&peer->transports);
 
 	/* Get the first transport of asoc. */
 	pos = asoc->peer.transport_addr_list.next;
@@ -765,7 +765,7 @@ struct sctp_transport *sctp_assoc_add_peer(struct sctp_association *asoc,
 	peer->state = peer_state;
 
 	/* Attach the remote transport to our asoc.  */
-	list_add_tail(&peer->transports, &asoc->peer.transport_addr_list);
+	list_add_tail_rcu(&peer->transports, &asoc->peer.transport_addr_list);
 	asoc->peer.transport_count++;
 
 	/* If we do not yet have a primary path, set one.  */
diff --git a/net/sctp/proc.c b/net/sctp/proc.c
index ec9b0c8..36a3f9d 100644
--- a/net/sctp/proc.c
+++ b/net/sctp/proc.c
@@ -159,7 +159,8 @@ static void sctp_seq_dump_remote_addrs(struct seq_file *seq, struct sctp_associa
 	struct sctp_af *af;
 
 	primary = &assoc->peer.primary_addr;
-	list_for_each_entry(transport, &assoc->peer.transport_addr_list,
+	rcu_read_lock();
+	list_for_each_entry_rcu(transport, &assoc->peer.transport_addr_list,
 			transports) {
 		addr = &transport->ipaddr;
 		af = sctp_get_af_specific(addr->sa.sa_family);
@@ -168,6 +169,7 @@ static void sctp_seq_dump_remote_addrs(struct seq_file *seq, struct sctp_associa
 		}
 		af->seq_dump_addr(seq, addr);
 	}
+	rcu_read_unlock();
 }
 
 static void * sctp_eps_seq_start(struct seq_file *seq, loff_t *pos)
@@ -438,11 +440,12 @@ static int sctp_remaddr_seq_show(struct seq_file *seq, void *v)
 	head = &sctp_assoc_hashtable[hash];
 	sctp_local_bh_disable();
 	read_lock(&head->lock);
+	rcu_read_lock();
 	sctp_for_each_hentry(epb, node, &head->chain) {
 		if (!net_eq(sock_net(epb->sk), seq_file_net(seq)))
 			continue;
 		assoc = sctp_assoc(epb);
-		list_for_each_entry(tsp, &assoc->peer.transport_addr_list,
+		list_for_each_entry_rcu(tsp, &assoc->peer.transport_addr_list,
 					transports) {
 			/*
 			 * The remote address (ADDR)
@@ -489,6 +492,7 @@ static int sctp_remaddr_seq_show(struct seq_file *seq, void *v)
 		}
 	}
 
+	rcu_read_unlock();
 	read_unlock(&head->lock);
 	sctp_local_bh_enable();
 
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index 206cf52..1295aec 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -163,13 +163,11 @@ void sctp_transport_free(struct sctp_transport *transport)
 	sctp_transport_put(transport);
 }
 
-/* Destroy the transport data structure.
- * Assumes there are no more users of this structure.
- */
-static void sctp_transport_destroy(struct sctp_transport *transport)
+static void sctp_transport_destroy_rcu(struct rcu_head *head)
 {
-	SCTP_ASSERT(transport->dead, "Transport is not dead", return);
+	struct sctp_transport *transport;
 
+	transport = container_of(head, struct sctp_transport, rcu);
 	if (transport->asoc)
 		sctp_association_put(transport->asoc);
 
@@ -180,6 +178,16 @@ static void sctp_transport_destroy(struct sctp_transport *transport)
 	SCTP_DBG_OBJCNT_DEC(transport);
 }
 
+/* Destroy the transport data structure.
+ * Assumes there are no more users of this structure.
+ */
+static void sctp_transport_destroy(struct sctp_transport *transport)
+{
+	SCTP_ASSERT(transport->dead, "Transport is not dead", return);
+
+	call_rcu(&transport->rcu, sctp_transport_destroy_rcu);
+}
+
 /* Start T3_rtx timer if it is not already running and update the heartbeat
  * timer.  This routine is called every time a DATA chunk is sent.
  */
-- 
1.7.11.7

^ permalink raw reply related

* Re: [PATCH net-next 0/7] Allow to monitor multicast cache event via rtnetlink
From: Thomas Graf @ 2012-12-06 17:49 UTC (permalink / raw)
  To: Nicolas Dichtel; +Cc: David Miller, David.Laight, netdev
In-Reply-To: <50C05AC2.1050504@6wind.com>

On 12/06/12 at 09:43am, Nicolas Dichtel wrote:
> Le 05/12/2012 18:54, David Miller a écrit :
> >From: "David Laight" <David.Laight@ACULAB.COM>
> >Date: Wed, 5 Dec 2012 11:41:33 -0000
> >
> >>Probably worth commenting that the 64bit items might only be 32bit aligned.
> >>Just to stop anyone trying to read/write them with pointer casts.
> >
> >Rather, let's not create this situation at all.
> >
> >It's totally inappropriate to have special code to handle every single
> >time we want to put 64-bit values into netlink messages.
> >
> >We need a real solution to this issue.
> >
> The easiest way is to update *_ALIGNTO values (maybe we can keep
> NLMSG_ALIGNTO to 4). But I think that many userland apps have these
> values hardcoded and, the most important thing, this may increase
> size of many netlink messages. Hence we need probably to find
> something better.

We can't do this, as you say, ALIGNTO is compiled into all the
binaries.

A simple backwards compatible workaround would be to include an
unknown, empty padding attribute if needed. That would be 4 bytes
in size and could be used to include padding as needed.

We could use nla_type = 0 as it is a reserved value that should
be available in all protocols. All readers (kernel and user space)
must ignore such an attribute just like any other unknown
attribute they encounter.

We could easily extend nla_put_u64() and variants to automatically
include such a padding attribute as needed.

The only situation that I can think of where this would not work
is if we have code like this:

   foo = nla_nest_start();
   for ([..])
      nla_put_u64([...])
   nla_nest_end([...])

and a reader would stupidly do a nla_for_each_attr() in user space
and assume all attributes found must be NLA_U64 without even
checking the length of the attribute.

I would say we take that risk and let such code die horribly.

^ permalink raw reply

* Hi!
From: Marketing Commucation @ 2012-12-06 17:18 UTC (permalink / raw)


I am Mr. Joseph. I have a lucrative business proposal of mutual interest to 
share with you.

^ permalink raw reply

* [PATCH] sctp: proc: protect bind_addr->address_list accesses with rcu_read_lock()
From: Thomas Graf @ 2012-12-06 17:34 UTC (permalink / raw)
  To: linux-sctp; +Cc: netdev, Vlad Yasevich, Neil Horman

address_list is protected via the socket lock or RCU. Since we don't want
to take the socket lock for each assoc we dump in procfs a RCU read-side
critical section must be entered.

Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 net/sctp/proc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/sctp/proc.c b/net/sctp/proc.c
index 9966e7b..ec9b0c8 100644
--- a/net/sctp/proc.c
+++ b/net/sctp/proc.c
@@ -139,7 +139,8 @@ static void sctp_seq_dump_local_addrs(struct seq_file *seq, struct sctp_ep_commo
 	    primary = &peer->saddr;
 	}
 
-	list_for_each_entry(laddr, &epb->bind_addr.address_list, list) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(laddr, &epb->bind_addr.address_list, list) {
 		addr = &laddr->a;
 		af = sctp_get_af_specific(addr->sa.sa_family);
 		if (primary && af->cmp_addr(addr, primary)) {
@@ -147,6 +148,7 @@ static void sctp_seq_dump_local_addrs(struct seq_file *seq, struct sctp_ep_commo
 		}
 		af->seq_dump_addr(seq, addr);
 	}
+	rcu_read_unlock();
 }
 
 /* Dump remote addresses of an association. */
-- 
1.7.11.7

^ permalink raw reply related

* Re: [RFC PATCH v2 3/3] tun: fix LSM/SELinux labeling of tun/tap devices
From: Paul Moore @ 2012-12-06 16:56 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-security-module, selinux, jasowang
In-Reply-To: <20121206161200.GA4340@redhat.com>

On Thursday, December 06, 2012 06:12:00 PM Michael S. Tsirkin wrote:
> On Thu, Dec 06, 2012 at 10:46:11AM -0500, Paul Moore wrote:
> > On Thursday, December 06, 2012 12:33:25 PM Michael S. Tsirkin wrote:
> > > OK so just to verify: this can be used to ensure that qemu
> > > process that has the queue fd can only attach it to
> > > a specific device, right?
> > 
> > Whenever a new queue is created via TUNSETQUEUE/tun_set_queue() the
> > security_tun_dev_create_queue() LSM hook is called.  When SELinux is
> > enabled this hook ends up calling selinux_tun_dev_create_queue() which
> > checks that the calling process (process_t) is allowed to create a new
> > queue on the specified device (tundev_t) .  If you are familiar with
> > SELinux security policy, the allow rule would look like this:
> >
> >   allow process_t tundev_t:tun_socket create_queue;
> > 
> > In practice, if we assume libvirt is creating the TUN device and running
> > with a SELinux label of virtd_t and that QEMU instances are running with
> > a SELinux label of svirt_t then the allow rule would look like this:
> >
> >   allow svirt_t virtd_t:tun_socket create_queue;
> > 
> > There is also the matter of the MLS/MCS constraints providing additional
> > separation but that is another level of detail which I don't believe is
> > important for our discussion.
> 
> Hmm. How do the rules for SETIFF look ATM?
> I am just checking default policy does not let qemu do with
> SETQUEUE something with a device which it can not
> attach to using SETIFF.

The SETQUEUE/tun_socket:create_queue permissions do not yet exist in any 
released SELinux policy as we are just now adding them with this patchset.  
With current policies loaded into a kernel with this patchset applied the 
SETQUEUE/tun_socket:create_queue permission would be treated according to the 
policy's unknown permission setting.

-- 
paul moore
security and virtualization @ redhat

^ permalink raw reply

* Re: build failure of latest ethtool
From: Or Gerlitz @ 2012-12-06 16:33 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev
In-Reply-To: <1354811376.2828.1.camel@bwh-desktop.uk.solarflarecom.com>

On 06/12/2012 18:29, Ben Hutchings wrote:
> If you want to build from git you have to run: ./autogen.sh && 
> ./configure && make

OK, sorry for the spam, I pulled the tree but didn't re-run ./autogen.sh 
&& ./configure

^ permalink raw reply

* Re: build failure of latest ethtool
From: Ben Hutchings @ 2012-12-06 16:29 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: netdev
In-Reply-To: <50C0C276.5060809@mellanox.com>

On Thu, 2012-12-06 at 18:06 +0200, Or Gerlitz wrote:
> Hi Ben,
> 
> FYI, building the latest ethtool git fails since sfpid.c is not present 
> in the Makefile.
[...]

If you want to build from git you have to run:

./autogen.sh && ./configure && make

Ben.

-- 
Ben Hutchings, Staff Engineer, Solarflare
Not speaking for my employer; that's the marketing department's job.
They asked us to note that Solarflare product names are trademarked.

^ permalink raw reply

* Re: [RFC PATCH v2 3/3] tun: fix LSM/SELinux labeling of tun/tap devices
From: Michael S. Tsirkin @ 2012-12-06 16:12 UTC (permalink / raw)
  To: Paul Moore; +Cc: netdev, linux-security-module, selinux, jasowang
In-Reply-To: <7448004.siKCIqQqTi@sifl>

On Thu, Dec 06, 2012 at 10:46:11AM -0500, Paul Moore wrote:
> On Thursday, December 06, 2012 12:33:25 PM Michael S. Tsirkin wrote:
> > On Wed, Dec 05, 2012 at 03:26:19PM -0500, Paul Moore wrote:
> > > This patch corrects some problems with LSM/SELinux that were introduced
> > > with the multiqueue patchset.  The problem stems from the fact that the
> > > multiqueue work changed the relationship between the tun device and its
> > > associated socket; before the socket persisted for the life of the
> > > device, however after the multiqueue changes the socket only persisted
> > > for the life of the userspace connection (fd open).  For non-persistent
> > > devices this is not an issue, but for persistent devices this can cause
> > > the tun device to lose its SELinux label.
> > > 
> > > We correct this problem by adding an opaque LSM security blob to the
> > > tun device struct which allows us to have the LSM security state, e.g.
> > > SELinux labeling information, persist for the lifetime of the tun
> > > device.  In the process we tweak the LSM hooks to work with this new
> > > approach to TUN device/socket labeling and introduce a new LSM hook,
> > > security_tun_dev_create_queue(), to approve requests to create a new
> > > TUN queue via TUNSETQUEUE.
> > > 
> > > The SELinux code has been adjusted to match the new LSM hooks, the
> > > other LSMs do not make use of the LSM TUN controls.  This patch makes
> > > use of the recently added "tun_socket:create_queue" permission to
> > > restrict access to the TUNSETQUEUE operation.  On older SELinux
> > > policies which do not define the "tun_socket:create_queue" permission
> > > the access control decision for TUNSETQUEUE will be handled according
> > > to the SELinux policy's unknown permission setting.
> > > 
> > > Signed-off-by: Paul Moore <pmoore@redhat.com>
> > 
> > OK so just to verify: this can be used to ensure that qemu
> > process that has the queue fd can only attach it to
> > a specific device, right?
> 
> Whenever a new queue is created via TUNSETQUEUE/tun_set_queue() the 
> security_tun_dev_create_queue() LSM hook is called.  When SELinux is enabled 
> this hook ends up calling selinux_tun_dev_create_queue() which checks that the 
> calling process (process_t) is allowed to create a new queue on the specified 
> device (tundev_t) .  If you are familiar with SELinux security policy, the 
> allow rule would look like this:
> 
>   allow process_t tundev_t:tun_socket create_queue;
> 
> In practice, if we assume libvirt is creating the TUN device and running with 
> a SELinux label of virtd_t and that QEMU instances are running with a SELinux 
> label of svirt_t then the allow rule would look like this:
> 
>   allow svirt_t virtd_t:tun_socket create_queue;
> 
> There is also the matter of the MLS/MCS constraints providing additional 
> separation but that is another level of detail which I don't believe is 
> important for our discussion.

Hmm. How do the rules for SETIFF look ATM?
I am just checking default policy does not let qemu do with
SETQUEUE something with a device which it can not
attach to using SETIFF.

> -- 
> paul moore
> security and virtualization @ redhat

^ permalink raw reply

* build failure of latest ethtool
From: Or Gerlitz @ 2012-12-06 16:06 UTC (permalink / raw)
  To: Ben Hutchings; +Cc: netdev

Hi Ben,

FYI, building the latest ethtool git fails since sfpid.c is not present 
in the Makefile. Once I built it manually and added to the linkage line, 
ethtool  builds OK.

$ gcc -Wall -g -O2   -o ethtool ethtool.o amd8111e.o de2104x.o e100.o 
e1000.o igb.o fec_8xx.o ibm_emac.o ixgb.o ixgbe.o natsemi.o pcnet32.o 
realtek.o tg3.o marvell.o vioc.o smsc911x.o at76c50x-usb.o sfc.o 
stmmac.o rxclass.o
ethtool.o: In function `do_getmodule':
/upstream/ethtool/ethtool.c:3616: undefined reference to `sff8079_show_all'
collect2: ld returned 1 exit status
make[1]: *** [ethtool] Error 1

^ permalink raw reply

* [net-next PATCH] tun: correctly report an error in tun_flow_init()
From: Paul Moore @ 2012-12-06 15:48 UTC (permalink / raw)
  To: netdev; +Cc: jasowang

On error, the error code from tun_flow_init() is lost inside
tun_set_iff(), this patch fixes this by assigning the tun_flow_init()
error code to the "err" variable which is returned by
the tun_flow_init() function on error.

Signed-off-by: Paul Moore <pmoore@redhat.com>
---
 drivers/net/tun.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index a1b2389..14a0454 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1591,7 +1591,8 @@ static int tun_set_iff(struct net *net, struct file *file, struct ifreq *ifr)
 
 		tun_net_init(dev);
 
-		if (tun_flow_init(tun))
+		err = tun_flow_init(tun);
+		if (err < 0)
 			goto err_free_dev;
 
 		dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |

^ permalink raw reply related

* Re: [RFC PATCH v2 1/3] tun: correctly report an error in tun_flow_init()
From: Paul Moore @ 2012-12-06 15:46 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-security-module, selinux, mst
In-Reply-To: <2215330.qi9iHRh0XG@jason-thinkpad-t430s>

On Thursday, December 06, 2012 06:31:29 PM Jason Wang wrote:
> On Wednesday, December 05, 2012 03:26:04 PM Paul Moore wrote:
> > On error, the error code from tun_flow_init() is lost inside
> > tun_set_iff(), this patch fixes this by assigning the tun_flow_init()
> > error code to the "err" variable which is returned by
> > the tun_flow_init() function on error.
> > 
> > Signed-off-by: Paul Moore <pmoore@redhat.com>
> > ---
> > 
> >  drivers/net/tun.c |    3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index a1b2389..14a0454 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -1591,7 +1591,8 @@ static int tun_set_iff(struct net *net, struct file
> > *file, struct ifreq *ifr)
> > 
> >                 tun_net_init(dev);
> > 
> > -               if (tun_flow_init(tun))
> > +               err = tun_flow_init(tun);
> > +               if (err < 0)
> > 
> >                         goto err_free_dev;
> >                 
> >                 dev->hw_features = NETIF_F_SG | NETIF_F_FRAGLIST |
> > 
> > --
> 
> Looks fine, we can separate this out of this series and replace the RFC with
> net-next to let David apply it soon.

Will do shortly.

-- 
paul moore
security and virtualization @ redhat


^ permalink raw reply

* Re: [RFC PATCH v2 3/3] tun: fix LSM/SELinux labeling of tun/tap devices
From: Paul Moore @ 2012-12-06 15:46 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: netdev, linux-security-module, selinux, jasowang
In-Reply-To: <20121206103325.GG10837@redhat.com>

On Thursday, December 06, 2012 12:33:25 PM Michael S. Tsirkin wrote:
> On Wed, Dec 05, 2012 at 03:26:19PM -0500, Paul Moore wrote:
> > This patch corrects some problems with LSM/SELinux that were introduced
> > with the multiqueue patchset.  The problem stems from the fact that the
> > multiqueue work changed the relationship between the tun device and its
> > associated socket; before the socket persisted for the life of the
> > device, however after the multiqueue changes the socket only persisted
> > for the life of the userspace connection (fd open).  For non-persistent
> > devices this is not an issue, but for persistent devices this can cause
> > the tun device to lose its SELinux label.
> > 
> > We correct this problem by adding an opaque LSM security blob to the
> > tun device struct which allows us to have the LSM security state, e.g.
> > SELinux labeling information, persist for the lifetime of the tun
> > device.  In the process we tweak the LSM hooks to work with this new
> > approach to TUN device/socket labeling and introduce a new LSM hook,
> > security_tun_dev_create_queue(), to approve requests to create a new
> > TUN queue via TUNSETQUEUE.
> > 
> > The SELinux code has been adjusted to match the new LSM hooks, the
> > other LSMs do not make use of the LSM TUN controls.  This patch makes
> > use of the recently added "tun_socket:create_queue" permission to
> > restrict access to the TUNSETQUEUE operation.  On older SELinux
> > policies which do not define the "tun_socket:create_queue" permission
> > the access control decision for TUNSETQUEUE will be handled according
> > to the SELinux policy's unknown permission setting.
> > 
> > Signed-off-by: Paul Moore <pmoore@redhat.com>
> 
> OK so just to verify: this can be used to ensure that qemu
> process that has the queue fd can only attach it to
> a specific device, right?

Whenever a new queue is created via TUNSETQUEUE/tun_set_queue() the 
security_tun_dev_create_queue() LSM hook is called.  When SELinux is enabled 
this hook ends up calling selinux_tun_dev_create_queue() which checks that the 
calling process (process_t) is allowed to create a new queue on the specified 
device (tundev_t) .  If you are familiar with SELinux security policy, the 
allow rule would look like this:

  allow process_t tundev_t:tun_socket create_queue;

In practice, if we assume libvirt is creating the TUN device and running with 
a SELinux label of virtd_t and that QEMU instances are running with a SELinux 
label of svirt_t then the allow rule would look like this:

  allow svirt_t virtd_t:tun_socket create_queue;

There is also the matter of the MLS/MCS constraints providing additional 
separation but that is another level of detail which I don't believe is 
important for our discussion.

-- 
paul moore
security and virtualization @ redhat


^ permalink raw reply

* [PATCH net] inet_diag: fix oops for IPv4 AF_INET6 TCP SYN-RECV state
From: Neal Cardwell @ 2012-12-06 15:42 UTC (permalink / raw)
  To: David Miller; +Cc: edumazet, netdev, Neal Cardwell

Fix inet_diag to be aware of the fact that AF_INET6 TCP connections
instantiated for IPv4 traffic and in the SYN-RECV state were actually
created with inet_reqsk_alloc(), instead of inet6_reqsk_alloc(). This
means that for such connections inet6_rsk(req) returns a pointer to a
random spot in memory up to roughly 64KB beyond the end of the
request_sock.

With this bug, for a server using AF_INET6 TCP sockets and serving
IPv4 traffic, an inet_diag user like `ss state SYN-RECV` would lead to
inet_diag_fill_req() causing an oops or the export to user space of 16
bytes of kernel memory as a garbage IPv6 address, depending on where
the garbage inet6_rsk(req) pointed.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
---
 net/ipv4/inet_diag.c |   53 ++++++++++++++++++++++++++++++++++++-------------
 1 files changed, 39 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 0c34bfa..35c3de4 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -44,6 +44,10 @@ struct inet_diag_entry {
 	u16 dport;
 	u16 family;
 	u16 userlocks;
+#if IS_ENABLED(CONFIG_IPV6)
+	struct in6_addr saddr_storage;	/* for IPv4-mapped-IPv6 addresses */
+	struct in6_addr daddr_storage;	/* for IPv4-mapped-IPv6 addresses */
+#endif
 };
 
 static DEFINE_MUTEX(inet_diag_table_mutex);
@@ -67,6 +71,36 @@ static inline void inet_diag_unlock_handler(
 	mutex_unlock(&inet_diag_table_mutex);
 }
 
+
+/* Get the IPv4, IPv6, or IPv4-mapped-IPv6 local and remote addresses
+ * from a request_sock. For IPv4-mapped-IPv6 we must map IPv4 to IPv6.
+ */
+static inline void inet_diag_req_addrs(const struct sock *sk,
+				       const struct request_sock *req,
+				       struct inet_diag_entry *entry)
+{
+	struct inet_request_sock *ireq = inet_rsk(req);
+
+#if IS_ENABLED(CONFIG_IPV6)
+	if (sk->sk_family == AF_INET6) {
+		if (req->rsk_ops->family == AF_INET6) {
+			entry->saddr = inet6_rsk(req)->loc_addr.s6_addr32;
+			entry->daddr = inet6_rsk(req)->rmt_addr.s6_addr32;
+		} else if (req->rsk_ops->family == AF_INET) {
+			ipv6_addr_set_v4mapped(ireq->loc_addr,
+					       &entry->saddr_storage);
+			ipv6_addr_set_v4mapped(ireq->rmt_addr,
+					       &entry->daddr_storage);
+			entry->saddr = entry->saddr_storage.s6_addr32;
+			entry->daddr = entry->daddr_storage.s6_addr32;
+		}
+		return;
+	}
+#endif
+	entry->saddr = &ireq->loc_addr;
+	entry->daddr = &ireq->rmt_addr;
+}
+
 int inet_sk_diag_fill(struct sock *sk, struct inet_connection_sock *icsk,
 			      struct sk_buff *skb, struct inet_diag_req_v2 *req,
 			      struct user_namespace *user_ns,		      	
@@ -637,8 +671,10 @@ static int inet_diag_fill_req(struct sk_buff *skb, struct sock *sk,
 	r->idiag_inode = 0;
 #if IS_ENABLED(CONFIG_IPV6)
 	if (r->idiag_family == AF_INET6) {
-		*(struct in6_addr *)r->id.idiag_src = inet6_rsk(req)->loc_addr;
-		*(struct in6_addr *)r->id.idiag_dst = inet6_rsk(req)->rmt_addr;
+		struct inet_diag_entry entry;
+		inet_diag_req_addrs(sk, req, &entry);
+		memcpy(r->id.idiag_src, entry.saddr, sizeof(struct in6_addr));
+		memcpy(r->id.idiag_dst, entry.daddr, sizeof(struct in6_addr));
 	}
 #endif
 
@@ -691,18 +727,7 @@ static int inet_diag_dump_reqs(struct sk_buff *skb, struct sock *sk,
 				continue;
 
 			if (bc) {
-				entry.saddr =
-#if IS_ENABLED(CONFIG_IPV6)
-					(entry.family == AF_INET6) ?
-					inet6_rsk(req)->loc_addr.s6_addr32 :
-#endif
-					&ireq->loc_addr;
-				entry.daddr =
-#if IS_ENABLED(CONFIG_IPV6)
-					(entry.family == AF_INET6) ?
-					inet6_rsk(req)->rmt_addr.s6_addr32 :
-#endif
-					&ireq->rmt_addr;
+				inet_diag_req_addrs(sk, req, &entry);
 				entry.dport = ntohs(ireq->rmt_port);
 
 				if (!inet_diag_bc_run(bc, &entry))
-- 
1.7.7.3

^ permalink raw reply related

* Re: [RFC PATCH v2 3/3] tun: fix LSM/SELinux labeling of tun/tap devices
From: Paul Moore @ 2012-12-06 15:36 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, linux-security-module, selinux, mst
In-Reply-To: <9040763.QsllgCP7TP@jason-thinkpad-t430s>

On Thursday, December 06, 2012 06:29:54 PM Jason Wang wrote:
> On Wednesday, December 05, 2012 03:26:19 PM Paul Moore wrote:
> > This patch corrects some problems with LSM/SELinux that were introduced
> > with the multiqueue patchset.  The problem stems from the fact that the
> > multiqueue work changed the relationship between the tun device and its
> > associated socket; before the socket persisted for the life of the
> > device, however after the multiqueue changes the socket only persisted
> > for the life of the userspace connection (fd open).  For non-persistent
> > devices this is not an issue, but for persistent devices this can cause
> > the tun device to lose its SELinux label.
> > 
> > We correct this problem by adding an opaque LSM security blob to the
> > tun device struct which allows us to have the LSM security state, e.g.
> > SELinux labeling information, persist for the lifetime of the tun
> > device.  In the process we tweak the LSM hooks to work with this new
> > approach to TUN device/socket labeling and introduce a new LSM hook,
> > security_tun_dev_create_queue(), to approve requests to create a new
> > TUN queue via TUNSETQUEUE.
> > 
> > The SELinux code has been adjusted to match the new LSM hooks, the
> > other LSMs do not make use of the LSM TUN controls.  This patch makes
> > use of the recently added "tun_socket:create_queue" permission to
> > restrict access to the TUNSETQUEUE operation.  On older SELinux
> > policies which do not define the "tun_socket:create_queue" permission
> > the access control decision for TUNSETQUEUE will be handled according
> > to the SELinux policy's unknown permission setting.
> > 
> > Signed-off-by: Paul Moore <pmoore@redhat.com>

...

> > @@ -4425,20 +4452,19 @@ static void selinux_tun_dev_post_create(struct
> > sock
> > *sk) * cause confusion to the TUN user that had no idea network labeling *
> > protocols were being used */
> > 
> > -	/* see the comments in selinux_tun_dev_create() about why we ...
> > -
> > -	sksec->sid = current_sid();
> > +	sksec->sid = tunsec->sid;
> 
> Since both tun_set_iff() and tun_set_queue() would call this. I wonder when
> it is called by tun_set_queue() we need some checking just like what we
> done in v1, otherwise it's unconditionally in TUNSETQUEUE. Or we can add
> them in selinux_tun_dev_create_queue()?

In all the cases that call tun_attach() we have a new socket which needs to be 
labeled based on the tun->security label, yes?  That is what the 
security_tun_dev_attach() code does, there is no need for access control at 
this point as the operation has already been authorized by either 
security_tun_dev_create() (new device), security_tun_dev_create_queue() (new 
queue), or security_tun_dev_open() (opening persistent device).

I think we are all set, or am I missing something?
 
> >  	sksec->sclass = SECCLASS_TUN_SOCKET;
> > 
> > +
> > +	return 0;
> > 
> >  }

-- 
paul moore
security and virtualization @ redhat

^ permalink raw reply

* [PATCH] net : enable tx time stamping in the vde driver.
From: Paul Chavent @ 2012-12-06 15:25 UTC (permalink / raw)
  To: jdike, richard, user-mode-linux-devel, netdev, richardcochran
  Cc: Paul Chavent

This new version moves the skb_tx_timestamp in the main uml
driver. This should avoid the need to call this function in each
transport (vde, slirp, tuntap, ...). It also add support for ethtool
get_ts_info.

Signed-off-by: Paul Chavent <paul.chavent@onera.fr>
---
 arch/um/drivers/net_kern.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/um/drivers/net_kern.c b/arch/um/drivers/net_kern.c
index b1314eb..5aa8696 100644
--- a/arch/um/drivers/net_kern.c
+++ b/arch/um/drivers/net_kern.c
@@ -218,6 +218,7 @@ static int uml_net_start_xmit(struct sk_buff *skb, struct net_device *dev)
 	spin_lock_irqsave(&lp->lock, flags);

 	len = (*lp->write)(lp->fd, skb, lp);
+	skb_tx_timestampns(skb);

 	if (len == skb->len) {
 		dev->stats.tx_packets++;
@@ -281,6 +282,7 @@ static void uml_net_get_drvinfo(struct net_device *dev,
 static const struct ethtool_ops uml_net_ethtool_ops = {
 	.get_drvinfo	= uml_net_get_drvinfo,
 	.get_link	= ethtool_op_get_link,
+	.get_ts_info	= ethtool_op_get_ts_info,
 };

 static void uml_net_user_timer_expire(unsigned long _conn)
-- 
1.7.12.1

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox