Netdev List
 help / color / mirror / Atom feed
* [PATCH] bna: bnad_udelay macro cleanup
From: Rasesh Mody @ 2010-12-23  7:05 UTC (permalink / raw)
  To: netdev; +Cc: Rasesh Mody, Debashis Dutt

Change details:
	- Removed unnecessary bnad_udelay macro for ia64

Signed-off-by: Debashis Dutt <ddutt@brocade.com>
Signed-off-by: Rasesh Mody <rmody@brocade.com>
---
 drivers/net/bna/bnad.c |    4 ++--
 drivers/net/bna/bnad.h |    6 ------
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/drivers/net/bna/bnad.c b/drivers/net/bna/bnad.c
index 277dbe0..12d1d7a 100644
--- a/drivers/net/bna/bnad.c
+++ b/drivers/net/bna/bnad.c
@@ -925,7 +925,7 @@ bnad_cb_tx_cleanup(struct bnad *bnad, struct bna_tcb *tcb)
 {
 	/* Delay only once for the whole Tx Path Shutdown */
 	if (!test_and_set_bit(BNAD_RF_TX_SHUTDOWN_DELAYED, &bnad->run_flags))
-		bnad_udelay(BNAD_TXRX_SYNC_UDELAY);
+		udelay(BNAD_TXRX_SYNC_UDELAY);
 }
 
 static void
@@ -938,7 +938,7 @@ bnad_cb_rx_cleanup(struct bnad *bnad,
 		clear_bit(BNAD_RXQ_STARTED, &ccb->rcb[1]->flags);
 
 	if (!test_and_set_bit(BNAD_RF_RX_SHUTDOWN_DELAYED, &bnad->run_flags))
-		bnad_udelay(BNAD_TXRX_SYNC_UDELAY);
+		udelay(BNAD_TXRX_SYNC_UDELAY);
 }
 
 static void
diff --git a/drivers/net/bna/bnad.h b/drivers/net/bna/bnad.h
index 24fd983..8b1d515 100644
--- a/drivers/net/bna/bnad.h
+++ b/drivers/net/bna/bnad.h
@@ -336,10 +336,4 @@ extern void bnad_netdev_hwstats_fill(struct bnad *bnad,
 	(((_bnad)->cfg_flags & BNAD_CF_DIM_ENABLED) && 		\
 	(test_bit(BNAD_RF_DIM_TIMER_RUNNING, &((_bnad)->run_flags))))
 
-#if defined(__ia64__)
-#define bnad_udelay	udelay
-#else
-#define bnad_udelay	__udelay
-#endif
-
 #endif /* __BNAD_H__ */
-- 
1.7.1


^ permalink raw reply related

* Re: [PATCH 00/12] make rpc_pipefs be mountable multiple times
From: Kirill A. Shutemov @ 2010-12-23  6:50 UTC (permalink / raw)
  To: J. Bruce Fields
  Cc: Kirill A. Shutemov, Trond Myklebust, Neil Brown, Pavel Emelyanov,
	linux-nfs, David S. Miller, netdev, linux-kernel
In-Reply-To: <20101221234520.GA30525@fieldses.org>

On Tue, Dec 21, 2010 at 06:45:21PM -0500, J. Bruce Fields wrote:
> On Wed, Dec 22, 2010 at 01:32:15AM +0200, Kirill A. Shutemov wrote:
> > On Mon, Dec 20, 2010 at 09:46:44AM -0500, J. Bruce Fields wrote:
> > > By the way, was there ever a resolution to Trond's question?:
> > > 
> > > 	http://marc.info/?l=linux-nfs&m=128655758712817&w=2
> > > 
> > > 	"The keyring upcalls are currently initiated through the same
> > > 	mechanism as module_request and therefore get started with the
> > > 	init_nsproxy namespace. We'd really like them to run inside the
> > > 	same container as the process.  As part of the same problem,
> > > 	there is the issue of what to do with the dns resolver and
> > > 	Bryan's new keyring based idmapper code."
> > 
> > I'm not sure that I understand the problem correctly.
> > 
> > Currently, idmap uses dentry taken from client's cl_rpcclient->cl_path
> > (see nfs_idmap_new()). cl_rpcclient (and cl_path) is initialized with
> > rpcmount resolved against mount namespace of mount process (see
> > nfs_create_rpc_client()).
> > I assume it's correct.
> 
> There's actually two separate sets of idmapper code; look at
> fs/nfs/idmapper.c, the first part of the file (between #ifdef
> CONFIG_NFS_USE_NEW_IDMAPPER and #else) is idmapping code that uses
> request_key().  The code you're looking at (including nfs_idmap_new())
> is later in the file, and deprecated.

IIUC, we need to save nsproxy of mount process in struct nfs_client and
pass it down to request_key(). I think it's outside of this patchset.

-- 
 Kirill A. Shutemov

^ permalink raw reply

* Re: [PATCH V7 1/8] ntp: add ADJ_SETOFFSET mode bit
From: Richard Cochran @ 2010-12-23  6:13 UTC (permalink / raw)
  To: Kuwahara,T.
  Cc: john stultz, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Alan Cox, Arnd Bergmann, Christoph Lameter, David Miller,
	Krzysztof Halasa, Peter Zijlstra, Rodolfo Giometti,
	Thomas Gleixner
In-Reply-To: <AANLkTimmTzH8+fSYmbajqZ+hU5Ps-UZaTp_1TYzjHB6P-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Thu, Dec 23, 2010 at 05:27:58AM +0900, Kuwahara,T. wrote:
> On Wed, Dec 22, 2010 at 7:25 AM, john stultz <johnstul-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> wrote:
> > I don't see why that would be better then adding a
> > clear new mode flag?
> 
> In short, time step is a special case of time slew.  Those are the same,
> only different in one parameter, as is shown in my previous post.
> That's why I said there's no need for adding a new mode.

Well, in addition to the objections raised by John, your suggested
implementation is also shortsighted. The field timex.constant is
copied into time_constant in some code paths. Obviously, this would be
a bad thing when timex.constant==-huge.

So, you need to clarify the interaction between ADJ_OFFSET,
ADJ_TIMECONST, ADJ_TAI, timex.constant, time_constant, and MAXTC.

If you would fully implement your idea, I expect it would become
obvious that it a bit of a hack, both in the kernel code and in the
user space interface. But, if you disagree, please just post a patch
with the complete implementation...

Thanks,
Richard

^ permalink raw reply

* Re: [net-next-2.6 PATCH v2 1/3] net: implement mechanism for HW based QOS
From: John Fastabend @ 2010-12-23  5:29 UTC (permalink / raw)
  To: Johannes Berg
  Cc: davem@davemloft.net, netdev@vger.kernel.org, hadi@cyberus.ca,
	shemminger@vyatta.com, tgraf@infradead.org,
	eric.dumazet@gmail.com, bhutchings@solarflare.com,
	nhorman@tuxdriver.com
In-Reply-To: <1293009149.3531.12.camel@jlt3.sipsolutions.net>

On 12/22/2010 1:12 AM, Johannes Berg wrote:
> On Tue, 2010-12-21 at 11:28 -0800, John Fastabend wrote:
>> This patch provides a mechanism for lower layer devices to
>> steer traffic using skb->priority to tx queues. This allows
>> for hardware based QOS schemes to use the default qdisc without
>> incurring the penalties related to global state and the qdisc
>> lock. While reliably receiving skbs on the correct tx ring
>> to avoid head of line blocking resulting from shuffling in
>> the LLD. Finally, all the goodness from txq caching and xps/rps
>> can still be leveraged.
> 
> Is there any chance this might be applicable to the 802.11 layer as
> well? We will definitely still need an ndo_select_queue handler to reset
> in the case where the peer doesn't support QoS, but it seems the part
> that depends on the frame itself could be pushed out to the generic
> framework instead of having net/wireless/util.c:cfg80211_classify8021d?
> 
> johannes
> 

Johannes,

I took a quick look at this and I believe it should be doable. It would be nice to completely remove the ndo_select_queue if possible though.

I probably won't have a chance to look any further into this for at least a week maybe two. So I'll think about it a bit more later unless someone else gets there first.

Thanks,
John.

> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


^ permalink raw reply

* Re: Using ethernet device as efficient small packet generator
From: juice @ 2010-12-23  5:15 UTC (permalink / raw)
  To: Eric Dumazet, Stephen Hemminger, netdev

> Reaching 1Gbs should not be a problem (I was speaking about 10Gbps)
> I reach link speed with my tg3 card and one single cpu :)
> (Broadcom Corporation NetXtreme BCM5715S Gigabit Ethernet (rev a3))
>
> Please provide : ethtool -S eth0
>

This is from the e1000 interface:
03:02.1 Ethernet controller: Intel Corporation 82546EB Gigabit Ethernet
Controller (Copper) (rev 01)

root@a2labralinux:/home/juice# ethtool -S eth1
NIC statistics:
     rx_packets: 192069
     tx_packets: 60000313
     rx_bytes: 33850492
     tx_bytes: 3840026215
     rx_broadcast: 192069
     tx_broadcast: 3
     rx_multicast: 0
     tx_multicast: 310
     rx_errors: 0
     tx_errors: 0
     tx_dropped: 0
     multicast: 0
     collisions: 0
     rx_length_errors: 0
     rx_over_errors: 0
     rx_crc_errors: 0
     rx_frame_errors: 0
     rx_no_buffer_count: 0
     rx_missed_errors: 0
     tx_aborted_errors: 0
     tx_carrier_errors: 0
     tx_fifo_errors: 0
     tx_heartbeat_errors: 0
     tx_window_errors: 0
     tx_abort_late_coll: 0
     tx_deferred_ok: 0
     tx_single_coll_ok: 0
     tx_multi_coll_ok: 0
     tx_timeout_count: 0
     tx_restart_queue: 1806437
     rx_long_length_errors: 0
     rx_short_length_errors: 0
     rx_align_errors: 0
     tx_tcp_seg_good: 0
     tx_tcp_seg_failed: 0
     rx_flow_control_xon: 0
     rx_flow_control_xoff: 0
     tx_flow_control_xon: 0
     tx_flow_control_xoff: 0
     rx_long_byte_count: 33850492
     rx_csum_offload_good: 8978
     rx_csum_offload_errors: 0
     rx_header_split: 0
     alloc_rx_buff_failed: 0
     tx_smbus: 0
     rx_smbus: 0
     dropped_smbus: 0


This is from the tg3 interface:
05:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5761
Gigabit Ethernet PCIe (rev 10)

root@d8labralinux:/home/juice# ethtool -S eth2
NIC statistics:
     rx_octets: 10814
     rx_fragments: 0
     rx_ucast_packets: 20
     rx_mcast_packets: 0
     rx_bcast_packets: 26
     rx_fcs_errors: 0
     rx_align_errors: 0
     rx_xon_pause_rcvd: 0
     rx_xoff_pause_rcvd: 0
     rx_mac_ctrl_rcvd: 0
     rx_xoff_entered: 0
     rx_frame_too_long_errors: 0
     rx_jabbers: 0
     rx_undersize_packets: 0
     rx_in_length_errors: 0
     rx_out_length_errors: 0
     rx_64_or_less_octet_packets: 0
     rx_65_to_127_octet_packets: 0
     rx_128_to_255_octet_packets: 0
     rx_256_to_511_octet_packets: 0
     rx_512_to_1023_octet_packets: 0
     rx_1024_to_1522_octet_packets: 0
     rx_1523_to_2047_octet_packets: 0
     rx_2048_to_4095_octet_packets: 0
     rx_4096_to_8191_octet_packets: 0
     rx_8192_to_9022_octet_packets: 0
     tx_octets: 5120013863
     tx_collisions: 0
     tx_xon_sent: 0
     tx_xoff_sent: 0
     tx_flow_control: 0
     tx_mac_errors: 0
     tx_single_collisions: 0
     tx_mult_collisions: 0
     tx_deferred: 0
     tx_excessive_collisions: 0
     tx_late_collisions: 0
     tx_collide_2times: 0
     tx_collide_3times: 0
     tx_collide_4times: 0
     tx_collide_5times: 0
     tx_collide_6times: 0
     tx_collide_7times: 0
     tx_collide_8times: 0
     tx_collide_9times: 0
     tx_collide_10times: 0
     tx_collide_11times: 0
     tx_collide_12times: 0
     tx_collide_13times: 0
     tx_collide_14times: 0
     tx_collide_15times: 0
     tx_ucast_packets: 80000034
     tx_mcast_packets: 42
     tx_bcast_packets: 40
     tx_carrier_sense_errors: 0
     tx_discards: 0
     tx_errors: 0
     dma_writeq_full: 0
     dma_write_prioq_full: 0
     rxbds_empty: 0
     rx_discards: 0
     rx_errors: 0
     rx_threshold_hit: 0
     dma_readq_full: 0
     dma_read_prioq_full: 0
     tx_comp_queue_full: 0
     ring_set_send_prod_index: 0
     ring_status_update: 0
     nic_irqs: 0
     nic_avoided_irqs: 0
     nic_tx_threshold_hit: 0




^ permalink raw reply

* Re: 2.6.36.2 - loop on read /proc/net/tcp
From: Eric Dumazet @ 2010-12-23  5:07 UTC (permalink / raw)
  To: Alexey Vlasov, David Miller; +Cc: linux-kernel, netdev
In-Reply-To: <20101222134343.GC19998@beaver.vrungel.ru>

Le mercredi 22 décembre 2010 à 16:43 +0300, Alexey Vlasov a écrit :
> Hi.
> 
> Has anyone seen such a bug at 2.6.36.2?
> # netstat -ntl
> Active Internet connections (only servers)
> Proto Recv-Q Send-Q Local Address           Foreign Address         State
> tcp        0      0 81.176.228.2:60608      0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:8099       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.5:8099       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.7:8099       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:8100       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.5:8100       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:8101       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.5:8101       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.5:20037      0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:8102       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.5:8102       0.0.0.0:*               LISTEN
> tcp        0      0 127.0.0.1:3399          0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:20040      0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:38985      0.0.0.0:*               LISTEN
> tcp        0      0 0.0.0.0:873             0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:20041      0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:20042      0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:3306       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.3:3306       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.2:3306       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.5:9099       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:9099       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:20043      0.0.0.0:*               LISTEN
> tcp        0      0 0.0.0.0:139             0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.5:9100       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:9100       0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:20044      0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.5:33549      0.0.0.0:*               LISTEN
> ...
> First 30 lines are ok
> 
> but then go lines repeating in "eternal" loop:
> tcp        0      0 81.176.228.2:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.3:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.7:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.2:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.3:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.7:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.2:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.3:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.4:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.7:80         0.0.0.0:*               LISTEN
> tcp        0      0 81.176.228.2:80         0.0.0.0:*               LISTEN
> 
> # cat /proc/net/tcp
> ...
> It can hang an hour or so. but not always actually.
> 
> # i=0; while [ "$i" -lt "10" ]; do time wc -l /proc/net/tcp; let "i = $i + 1"; done
> 614782727 /proc/net/tcp
> 
> real    18m42.066s
> user    0m12.620s
> sys     18m25.890s
> 19443 /proc/net/tcp
> 
> real    0m0.040s
> user    0m0.000s
> sys     0m0.030s
> 19503 /proc/net/tcp
> 
> real    0m0.040s
> sys     0m0.030s
> 19502 /proc/net/tcp
> 
> real    0m0.041s
> user    0m0.000s
> sys     0m0.040s
> 28525 /proc/net/tcp
> 
> real    0m0.059s
> user    0m0.000s
> sys     0m0.050s
> 19463 /proc/net/tcp
> 
> real    0m0.048s
> user    0m0.000s
> sys     0m0.040s
> 19521 /proc/net/tcp
> 
> real    0m0.040s
> user    0m0.000s
> sys     0m0.030s
> 54394 /proc/net/tcp
> 
> real    0m0.104s
> user    0m0.000s
> sys     0m0.100s
> 19479 /proc/net/tcp
> 
> real    0m0.040s
> user    0m0.000s
> sys     0m0.030s
> 19481 /proc/net/tcp
> 
> real    0m0.040s
> user    0m0.000s
> sys     0m0.030s
> 

Hi Alexey

Thanks a lot for your report.

Here is a fix.

(Incidentaly, this means accesses to 0x40000000 addresses dont trigger
faults, since we never BUG() at this point)

David, this is a stable candidate. (2.6.29 +)

Thanks !

[PATCH] tcp: fix listening_get_next()

Alexey Vlasov found /proc/net/tcp could sometime loop and display
millions of sockets in LISTEN state.

In 2.6.29, when we converted TCP hash tables to RCU, we left two
sk_next() calls in listening_get_next().

We must instead use sk_nulls_next() to properly detect an end of chain.

Reported-by: Alexey Vlasov <renton@renton.name>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
---
 net/ipv4/tcp_ipv4.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index e13da6d..d978bb2 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2030,7 +2030,7 @@ static void *listening_get_next(struct seq_file *seq, void *cur)
 get_req:
 			req = icsk->icsk_accept_queue.listen_opt->syn_table[st->sbucket];
 		}
-		sk	  = sk_next(st->syn_wait_sk);
+		sk	  = sk_nulls_next(st->syn_wait_sk);
 		st->state = TCP_SEQ_STATE_LISTENING;
 		read_unlock_bh(&icsk->icsk_accept_queue.syn_wait_lock);
 	} else {
@@ -2039,7 +2039,7 @@ get_req:
 		if (reqsk_queue_len(&icsk->icsk_accept_queue))
 			goto start_req;
 		read_unlock_bh(&icsk->icsk_accept_queue.syn_wait_lock);
-		sk = sk_next(sk);
+		sk = sk_nulls_next(sk);
 	}
 get_sk:
 	sk_nulls_for_each_from(sk, node) {



^ permalink raw reply related

* Re: [PATCH v7] kptr_restrict for hiding kernel pointers
From: Joe Perches @ 2010-12-23  4:10 UTC (permalink / raw)
  To: Dan Rosenberg
  Cc: linux-kernel, netdev, linux-security-module, jmorris,
	eric.dumazet, tgraf, eugeneteo, kees.cook, mingo, davem,
	a.p.zijlstra, akpm, eparis
In-Reply-To: <1293069792.9820.342.camel@dan>

On Wed, 2010-12-22 at 21:03 -0500, Dan Rosenberg wrote:
> Add the %pK printk format specifier and
> the /proc/sys/kernel/kptr_restrict sysctl.

Another trivial style comment:

> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
[]
> +	case 'K':
> +		/*
> +		 * %pK cannot be used in IRQ context because its test
> +		 * for CAP_SYSLOG would be meaningless.
> +		 */
> +		if (in_irq() || in_serving_softirq() || in_nmi()) {
> +			if (spec.field_width == -1)
> +				spec.field_width = 2 * sizeof(void *);
> +			return string(buf, end, "pK-error", spec);
> +		}
> +
> +		else if ((kptr_restrict == 0) ||
> +			 (kptr_restrict == 1 &&
> +			  has_capability_noaudit(current, CAP_SYSLOG)))
> +			break;
> +

---

> +		if (spec.field_width == -1) {
> +			spec.field_width = 2 * sizeof(void *);
> +			spec.flags |= ZEROPAD;
> +		}
> +		return number(buf, end, 0, spec);

It'd be slightly smaller code to use:

 		   ptr = 0;
		   break;

and delete the if block and return number.

>  	}
>  	spec.flags |= SMALL;
>  	if (spec.field_width == -1) {

^ permalink raw reply

* [PATCH] Convert net %p usage %pK
From: Dan Rosenberg @ 2010-12-23  3:22 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-sctp
  Cc: David S. Miller, Alexey Kuznetsov, James Morris,
	Pekka Savola (ipv6), Remi Denis-Courmont, Vlad Yasevich,
	Patrick McHardy, Sridhar Samudrala, Hideaki YOSHIFUJI, Tejun Heo,
	Eric Dumazet, Li Zefan, Joe Perches, Stephen Hemminger,
	Jamal Hadi Salim, Eric W. Biederman, Alexey Dobriyan, Jiri Pirko,
	Oliver Hartkopp, Urs Thuermann, Greg Kroah-Hartman,
	Daniel Lezcano, Pavel 

The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces. Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers. The behavior of %pK depends on the kptr_restrict sysctl.

If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs.  If kptr_restrict is set to 1, the default, if the current user
(intended to be a reader via seq_printf(), etc.) does not have
CAP_SYSLOG (currently in the LSM tree), kernel pointers using %pK are
printed as 0's.  If kptr_restrict is set to 2, kernel pointers using %pK
are printed as 0's regardless of privileges.  Replacing with 0's was
chosen over the default "(null)", which cannot be parsed by userland %p,
which expects "(nil)".

The supporting code for kptr_restrict and %pK are currently in the -mm
tree.  This patch converts users of %p in net/ to %pK.  Cases of
printing pointers to the syslog are not covered, since this would
eliminate useful information for postmortem debugging and the reading of
the syslog is already optionally protected by the dmesg_restrict sysctl.

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
---
 net/atm/proc.c           |    4 ++--
 net/can/bcm.c            |    6 +++---
 net/ipv4/raw.c           |    2 +-
 net/ipv4/tcp_ipv4.c      |    6 +++---
 net/ipv4/udp.c           |    2 +-
 net/ipv6/raw.c           |    2 +-
 net/ipv6/tcp_ipv6.c      |    6 +++---
 net/ipv6/udp.c           |    2 +-
 net/key/af_key.c         |    2 +-
 net/netlink/af_netlink.c |    2 +-
 net/packet/af_packet.c   |    2 +-
 net/phonet/socket.c      |    2 +-
 net/sctp/proc.c          |    4 ++--
 net/unix/af_unix.c       |    2 +-
 14 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/net/atm/proc.c b/net/atm/proc.c
index f85da07..be3afde 100644
--- a/net/atm/proc.c
+++ b/net/atm/proc.c
@@ -191,7 +191,7 @@ static void vcc_info(struct seq_file *seq, struct atm_vcc *vcc)
 {
 	struct sock *sk = sk_atm(vcc);
 
-	seq_printf(seq, "%p ", vcc);
+	seq_printf(seq, "%pK ", vcc);
 	if (!vcc->dev)
 		seq_printf(seq, "Unassigned    ");
 	else
@@ -218,7 +218,7 @@ static void svc_info(struct seq_file *seq, struct atm_vcc *vcc)
 {
 	if (!vcc->dev)
 		seq_printf(seq, sizeof(void *) == 4 ?
-			   "N/A@%p%10s" : "N/A@%p%2s", vcc, "");
+			   "N/A@%pK%10s" : "N/A@%pK%2s", vcc, "");
 	else
 		seq_printf(seq, "%3d %3d %5d         ",
 			   vcc->dev->number, vcc->vpi, vcc->vci);
diff --git a/net/can/bcm.c b/net/can/bcm.c
index 6faa825..1d2a0d6 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -165,9 +165,9 @@ static int bcm_proc_show(struct seq_file *m, void *v)
 	struct bcm_sock *bo = bcm_sk(sk);
 	struct bcm_op *op;
 
-	seq_printf(m, ">>> socket %p", sk->sk_socket);
-	seq_printf(m, " / sk %p", sk);
-	seq_printf(m, " / bo %p", bo);
+	seq_printf(m, ">>> socket %pK", sk->sk_socket);
+	seq_printf(m, " / sk %pK", sk);
+	seq_printf(m, " / bo %pK", bo);
 	seq_printf(m, " / dropped %lu", bo->dropped_usr_msgs);
 	seq_printf(m, " / bound %s", bcm_proc_getifname(ifname, bo->ifindex));
 	seq_printf(m, " <<<\n");
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index 1f85ef2..6cb9d20 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -949,7 +949,7 @@ static void raw_sock_seq_show(struct seq_file *seq, struct sock *sp, int i)
 	      srcp  = inet->inet_num;
 
 	seq_printf(seq, "%4d: %08X:%04X %08X:%04X"
-		" %02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p %d\n",
+		" %02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %pK %d\n",
 		i, src, srcp, dest, destp, sp->sk_state,
 		sk_wmem_alloc_get(sp),
 		sk_rmem_alloc_get(sp),
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index e13da6d..86e46a0 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2389,7 +2389,7 @@ static void get_openreq4(struct sock *sk, struct request_sock *req,
 	int ttd = req->expires - jiffies;
 
 	seq_printf(f, "%4d: %08X:%04X %08X:%04X"
-		" %02X %08X:%08X %02X:%08lX %08X %5d %8d %u %d %p%n",
+		" %02X %08X:%08X %02X:%08lX %08X %5d %8d %u %d %pK%n",
 		i,
 		ireq->loc_addr,
 		ntohs(inet_sk(sk)->inet_sport),
@@ -2444,7 +2444,7 @@ static void get_tcp4_sock(struct sock *sk, struct seq_file *f, int i, int *len)
 		rx_queue = max_t(int, tp->rcv_nxt - tp->copied_seq, 0);
 
 	seq_printf(f, "%4d: %08X:%04X %08X:%04X %02X %08X:%08X %02X:%08lX "
-			"%08X %5d %8d %lu %d %p %lu %lu %u %u %d%n",
+			"%08X %5d %8d %lu %d %pK %lu %lu %u %u %d%n",
 		i, src, srcp, dest, destp, sk->sk_state,
 		tp->write_seq - tp->snd_una,
 		rx_queue,
@@ -2479,7 +2479,7 @@ static void get_timewait4_sock(struct inet_timewait_sock *tw,
 	srcp  = ntohs(tw->tw_sport);
 
 	seq_printf(f, "%4d: %08X:%04X %08X:%04X"
-		" %02X %08X:%08X %02X:%08lX %08X %5d %8d %d %d %p%n",
+		" %02X %08X:%08X %02X:%08lX %08X %5d %8d %d %d %pK%n",
 		i, src, srcp, dest, destp, tw->tw_substate, 0, 0,
 		3, jiffies_to_clock_t(ttd), 0, 0, 0, 0,
 		atomic_read(&tw->tw_refcnt), tw, len);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 5e0a3a5..55dcf56 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2046,7 +2046,7 @@ static void udp4_format_sock(struct sock *sp, struct seq_file *f,
 	__u16 srcp	  = ntohs(inet->inet_sport);
 
 	seq_printf(f, "%5d: %08X:%04X %08X:%04X"
-		" %02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p %d%n",
+		" %02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %pK %d%n",
 		bucket, src, srcp, dest, destp, sp->sk_state,
 		sk_wmem_alloc_get(sp),
 		sk_rmem_alloc_get(sp),
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 86c3952..9c883b7 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -1231,7 +1231,7 @@ static void raw6_sock_seq_show(struct seq_file *seq, struct sock *sp, int i)
 	srcp  = inet_sk(sp)->inet_num;
 	seq_printf(seq,
 		   "%4d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X "
-		   "%02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p %d\n",
+		   "%02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %pK %d\n",
 		   i,
 		   src->s6_addr32[0], src->s6_addr32[1],
 		   src->s6_addr32[2], src->s6_addr32[3], srcp,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 7e41e2c..60a2d56 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1975,7 +1975,7 @@ static void get_openreq6(struct seq_file *seq,
 
 	seq_printf(seq,
 		   "%4d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X "
-		   "%02X %08X:%08X %02X:%08lX %08X %5d %8d %d %d %p\n",
+		   "%02X %08X:%08X %02X:%08lX %08X %5d %8d %d %d %pK\n",
 		   i,
 		   src->s6_addr32[0], src->s6_addr32[1],
 		   src->s6_addr32[2], src->s6_addr32[3],
@@ -2026,7 +2026,7 @@ static void get_tcp6_sock(struct seq_file *seq, struct sock *sp, int i)
 
 	seq_printf(seq,
 		   "%4d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X "
-		   "%02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p %lu %lu %u %u %d\n",
+		   "%02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %pK %lu %lu %u %u %d\n",
 		   i,
 		   src->s6_addr32[0], src->s6_addr32[1],
 		   src->s6_addr32[2], src->s6_addr32[3], srcp,
@@ -2068,7 +2068,7 @@ static void get_timewait6_sock(struct seq_file *seq,
 
 	seq_printf(seq,
 		   "%4d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X "
-		   "%02X %08X:%08X %02X:%08lX %08X %5d %8d %d %d %p\n",
+		   "%02X %08X:%08X %02X:%08lX %08X %5d %8d %d %d %pK\n",
 		   i,
 		   src->s6_addr32[0], src->s6_addr32[1],
 		   src->s6_addr32[2], src->s6_addr32[3], srcp,
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 91def93..ba25da4 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1395,7 +1395,7 @@ static void udp6_sock_seq_show(struct seq_file *seq, struct sock *sp, int bucket
 	srcp  = ntohs(inet->inet_sport);
 	seq_printf(seq,
 		   "%5d: %08X%08X%08X%08X:%04X %08X%08X%08X%08X:%04X "
-		   "%02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %p %d\n",
+		   "%02X %08X:%08X %02X:%08lX %08X %5d %8d %lu %d %pK %d\n",
 		   bucket,
 		   src->s6_addr32[0], src->s6_addr32[1],
 		   src->s6_addr32[2], src->s6_addr32[3], srcp,
diff --git a/net/key/af_key.c b/net/key/af_key.c
index d87c22d..cf00ddf 100644
--- a/net/key/af_key.c
+++ b/net/key/af_key.c
@@ -3643,7 +3643,7 @@ static int pfkey_seq_show(struct seq_file *f, void *v)
 	if (v == SEQ_START_TOKEN)
 		seq_printf(f ,"sk       RefCnt Rmem   Wmem   User   Inode\n");
 	else
-		seq_printf(f ,"%p %-6d %-6u %-6u %-6u %-6lu\n",
+		seq_printf(f, "%pK %-6d %-6u %-6u %-6u %-6lu\n",
 			       s,
 			       atomic_read(&s->sk_refcnt),
 			       sk_rmem_alloc_get(s),
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 478181d..31425fa 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -1990,7 +1990,7 @@ static int netlink_seq_show(struct seq_file *seq, void *v)
 		struct sock *s = v;
 		struct netlink_sock *nlk = nlk_sk(s);
 
-		seq_printf(seq, "%p %-3d %-6d %08x %-8d %-8d %p %-8d %-8d %-8lu\n",
+		seq_printf(seq, "%pK %-3d %-6d %08x %-8d %-8d %pK %-8d %-8d %-8lu\n",
 			   s,
 			   s->sk_protocol,
 			   nlk->pid,
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 8298e67..02b1b58 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2639,7 +2639,7 @@ static int packet_seq_show(struct seq_file *seq, void *v)
 		const struct packet_sock *po = pkt_sk(s);
 
 		seq_printf(seq,
-			   "%p %-6d %-4d %04x   %-5d %1d %-6u %-6u %-6lu\n",
+			   "%pK %-6d %-4d %04x   %-5d %1d %-6u %-6u %-6lu\n",
 			   s,
 			   atomic_read(&s->sk_refcnt),
 			   s->sk_type,
diff --git a/net/phonet/socket.c b/net/phonet/socket.c
index 25f746d..2c8f9d9 100644
--- a/net/phonet/socket.c
+++ b/net/phonet/socket.c
@@ -632,7 +632,7 @@ static int pn_sock_seq_show(struct seq_file *seq, void *v)
 		struct pn_sock *pn = pn_sk(sk);
 
 		seq_printf(seq, "%2d %04X:%04X:%02X %02X %08X:%08X %5d %lu "
-			"%d %p %d%n",
+			"%d %pK %d%n",
 			sk->sk_protocol, pn->sobject, 0, pn->resource,
 			sk->sk_state,
 			sk_wmem_alloc_get(sk), sk_rmem_alloc_get(sk),
diff --git a/net/sctp/proc.c b/net/sctp/proc.c
index 61aacfb..05a6ce2 100644
--- a/net/sctp/proc.c
+++ b/net/sctp/proc.c
@@ -212,7 +212,7 @@ static int sctp_eps_seq_show(struct seq_file *seq, void *v)
 	sctp_for_each_hentry(epb, node, &head->chain) {
 		ep = sctp_ep(epb);
 		sk = epb->sk;
-		seq_printf(seq, "%8p %8p %-3d %-3d %-4d %-5d %5d %5lu ", ep, sk,
+		seq_printf(seq, "%8pK %8pK %-3d %-3d %-4d %-5d %5d %5lu ", ep, sk,
 			   sctp_sk(sk)->type, sk->sk_state, hash,
 			   epb->bind_addr.port,
 			   sock_i_uid(sk), sock_i_ino(sk));
@@ -316,7 +316,7 @@ static int sctp_assocs_seq_show(struct seq_file *seq, void *v)
 		assoc = sctp_assoc(epb);
 		sk = epb->sk;
 		seq_printf(seq,
-			   "%8p %8p %-3d %-3d %-2d %-4d "
+			   "%8pK %8pK %-3d %-3d %-2d %-4d "
 			   "%4d %8d %8d %7d %5lu %-5d %5d ",
 			   assoc, sk, sctp_sk(sk)->type, sk->sk_state,
 			   assoc->state, hash,
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 2268e67..e6d7d04 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2225,7 +2225,7 @@ static int unix_seq_show(struct seq_file *seq, void *v)
 		struct unix_sock *u = unix_sk(s);
 		unix_state_lock(s);
 
-		seq_printf(seq, "%p: %08X %08X %08X %04X %02X %5lu",
+		seq_printf(seq, "%pK: %08X %08X %08X %04X %02X %5lu",
 			s,
 			atomic_read(&s->sk_refcnt),
 			0,

^ permalink raw reply related

* [PATCH v7] kptr_restrict for hiding kernel pointers
From: Dan Rosenberg @ 2010-12-23  2:03 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-security-module
  Cc: jmorris, eric.dumazet, tgraf, eugeneteo, kees.cook, mingo, davem,
	a.p.zijlstra, akpm, eparis

Add the %pK printk format specifier and
the /proc/sys/kernel/kptr_restrict sysctl.

The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces. Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers. The behavior of %pK depends on the kptr_restrict sysctl.

If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs.  If kptr_restrict is set to 1, the default, if the current user
(intended to be a reader via seq_printf(), etc.) does not have
CAP_SYSLOG (currently in the LSM tree), kernel pointers using %pK are
printed as 0's.  If kptr_restrict is set to 2, kernel pointers using %pK
are printed as 0's regardless of privileges.  Replacing with 0's was
chosen over the default "(null)", which cannot be parsed by userland %p,
which expects "(nil)".


v7 moves the extern to printk.h and cleans up the conditional statements
based on the suggestions of Joe Perches.

v6 removes the WARN_ONCE in favor of returning "pK-error" to avoid
breaking in certain cases, thanks to Ingo Molnar.

v5 sets kptr_restrict to a default value of 1, and properly handles the
case where it's incorrectly used in IRQ context. 

v4 incorporates Eric Paris' suggestion of using
has_capability_noaudit(), since failing this capability check is not a
policy violation but rather a code path choice and shouldn't generate
potentially excessive log noise.  Adjusted IRQ comment for clarity.

v3 adds the "2" setting, cleans up documentation, removes the CONFIG,
and incorporates changes and suggestions from Andrew Morton.

v2 improves checking for inappropriate context, on suggestion by Peter
Zijlstra.  Thanks to Thomas Graf for suggesting use of a centralized
format specifier.


Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: James Morris <jmorris@namei.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Thomas Graf <tgraf@infradead.org>
Cc: Eugene Teo <eugeneteo@kernel.org>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Paris <eparis@parisplace.org>
---
 Documentation/sysctl/kernel.txt |   14 ++++++++++++++
 include/linux/printk.h          |    1 +
 kernel/sysctl.c                 |    9 +++++++++
 lib/vsprintf.c                  |   24 ++++++++++++++++++++++++
 4 files changed, 48 insertions(+), 0 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 209e158..8ace8c4 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -34,6 +34,7 @@ show up in /proc/sys/kernel:
 - hotplug
 - java-appletviewer           [ binfmt_java, obsolete ]
 - java-interpreter            [ binfmt_java, obsolete ]
+- kptr_restrict
 - kstack_depth_to_print       [ X86 only ]
 - l2cr                        [ PPC only ]
 - modprobe                    ==> Documentation/debugging-modules.txt
@@ -261,6 +262,19 @@ This flag controls the L2 cache of G3 processor boards. If
 
 ==============================================================
 
+kptr_restrict:
+
+This toggle indicates whether restrictions are placed on
+exposing kernel addresses via /proc and other interfaces.  When
+kptr_restrict is set to (0), there are no restrictions.  When
+kptr_restrict is set to (1), the default, kernel pointers
+printed using the %pK format specifier will be replaced with 0's
+unless the user has CAP_SYSLOG.  When kptr_restrict is set to
+(2), kernel pointers printed using %pK will be replaced with 0's
+regardless of privileges.
+
+==============================================================
+
 kstack_depth_to_print: (X86 only)
 
 Controls the number of words to print when dumping the raw
diff --git a/include/linux/printk.h b/include/linux/printk.h
index b772ca5..9adfba6 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -83,6 +83,7 @@ extern bool printk_timed_ratelimit(unsigned long *caller_jiffies,
 
 extern int printk_delay_msec;
 extern int dmesg_restrict;
+extern int kptr_restrict;
 
 /*
  * Print a one-time message (analogous to WARN_ONCE() et al):
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 5abfa15..236fa91 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -713,6 +713,15 @@ static struct ctl_table kern_table[] = {
 	},
 #endif
 	{
+		.procname	= "kptr_restrict",
+		.data		= &kptr_restrict,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &two,
+	},
+	{
 		.procname	= "ngroups_max",
 		.data		= &ngroups_max,
 		.maxlen		= sizeof (int),
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index c150d3d..ea556da 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -936,6 +936,8 @@ char *uuid_string(char *buf, char *end, const u8 *addr,
 	return string(buf, end, uuid, spec);
 }
 
+int kptr_restrict = 1;
+
 /*
  * Show a '%p' thing.  A kernel extension is that the '%p' is followed
  * by an extra set of alphanumeric characters that are extended format
@@ -979,6 +981,7 @@ char *uuid_string(char *buf, char *end, const u8 *addr,
  *       Implements a "recursive vsnprintf".
  *       Do not use this feature without some mechanism to verify the
  *       correctness of the format string and va_list arguments.
+ * - 'K' For a kernel pointer that should be hidden from unprivileged users
  *
  * Note: The difference between 'S' and 'F' is that on ia64 and ppc64
  * function pointers are really function descriptors, which contain a
@@ -1035,6 +1038,27 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
 		return buf + vsnprintf(buf, end - buf,
 				       ((struct va_format *)ptr)->fmt,
 				       *(((struct va_format *)ptr)->va));
+	case 'K':
+		/*
+		 * %pK cannot be used in IRQ context because its test
+		 * for CAP_SYSLOG would be meaningless.
+		 */
+		if (in_irq() || in_serving_softirq() || in_nmi()) {
+			if (spec.field_width == -1)
+				spec.field_width = 2 * sizeof(void *);
+			return string(buf, end, "pK-error", spec);
+		}
+
+		else if ((kptr_restrict == 0) ||
+			 (kptr_restrict == 1 &&
+			  has_capability_noaudit(current, CAP_SYSLOG)))
+			break;
+
+		if (spec.field_width == -1) {
+			spec.field_width = 2 * sizeof(void *);
+			spec.flags |= ZEROPAD;
+		}
+		return number(buf, end, 0, spec);
 	}
 	spec.flags |= SMALL;
 	if (spec.field_width == -1) {



^ permalink raw reply related

* Re: pull request: wireless-2.6 2010-12-22
From: David Miller @ 2010-12-23  1:35 UTC (permalink / raw)
  To: linville; +Cc: linux-wireless, netdev, linux-kernel
In-Reply-To: <20101222195209.GD10046@tuxdriver.com>

From: "John W. Linville" <linville@tuxdriver.com>
Date: Wed, 22 Dec 2010 14:52:09 -0500

> Dave,
> 
> Three more fixes intended for 2.6.37...
> 
> The one from Johannes Berg avoids a NULL pointer dereference in the mesh
> code.  The one from Johannes Stezenbach fixes bug 24892, which is a
> lock-up caused by rt2x00.  Finally, you probably recognize the one from
> Meelis Roos which removes some log spam coming from hostap.  These have
> all spent several days in linux-next without any adverse effects.
> 
> Please let me know if there are problems!

Pulled, thanks John.

^ permalink raw reply

* Re: [PATCH V7 1/8] ntp: add ADJ_SETOFFSET mode bit
From: john stultz @ 2010-12-23  0:00 UTC (permalink / raw)
  To: Kuwahara,T.
  Cc: Richard Cochran, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA, netdev-u79uwXL29TY76Z2rM5mHXA,
	Alan Cox, Arnd Bergmann, Christoph Lameter, David Miller,
	Krzysztof Halasa, Peter Zijlstra, Rodolfo Giometti,
	Thomas Gleixner
In-Reply-To: <AANLkTimmTzH8+fSYmbajqZ+hU5Ps-UZaTp_1TYzjHB6P-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On Thu, 2010-12-23 at 05:27 +0900, Kuwahara,T. wrote:
> On Wed, Dec 22, 2010 at 7:25 AM, john stultz <johnstul-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org> wrote:
> > I don't see why that would be better then adding a
> > clear new mode flag?
> 
> In short, time step is a special case of time slew.  Those are the same,
> only different in one parameter, as is shown in my previous post.
> That's why I said there's no need for adding a new mode.

Sure, I get that a instantaneous offset adjustment could be represented
as a instantaneous slew of the same amount. But what is the benefit of
doing this? The ADJ_OFFSET isn't really a continuous function like you
describe. For instance, you can't slew time backwards. The amount you
can slow or speed up the clock is limited, so time will always increase.
So to have a magic value way outside the bound of normal operation that
does something special seems a bit obfuscated.

ADJ_SETOFFSET isn't slewing the clock. So I think its much clearer to
add a new mode, rather then to try to wedge the functionality into a
corner case of ADJ_OFFSET slewing just because one could mathematically
represent it the same way.

I see the cost of adding it as a special case as adding extra complexity
to an already complex subsystem. Doing things that make the code easier
to understand and read makes it more maintainable. And I don't see the
gain (other then saving a bit in the bit flag) to offset the complexity
of trying to squeeze this functionality into the ADJ_OFFSET mode.

thanks
-john

^ permalink raw reply

* [PATCH] irda: prevent integer underflow in IRLMP_ENUMDEVICES
From: Dan Rosenberg @ 2010-12-22 23:58 UTC (permalink / raw)
  To: samuel, davem; +Cc: netdev, security

If the user-provided len is less than the expected offset, the
IRLMP_ENUMDEVICES getsockopt will do a copy_to_user() with a very large
size value.  While this isn't be a security issue on x86 because it will
get caught by the access_ok() check, it may leak large amounts of kernel
heap on other architectures.  In any event, this patch fixes it.

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: stable <stable@kernel.org>
---
 net/irda/af_irda.c |   18 +++++++++++-------
 1 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c
index a6de305..241d7a0 100644
--- a/net/irda/af_irda.c
+++ b/net/irda/af_irda.c
@@ -2280,6 +2280,16 @@ static int irda_getsockopt(struct socket *sock, int level, int optname,
 
 	switch (optname) {
 	case IRLMP_ENUMDEVICES:
+
+		/* Offset to first device entry */
+		offset = sizeof(struct irda_device_list) -
+			sizeof(struct irda_device_info);
+
+		if (len < offset) {
+			err = -EINVAL;
+			goto out;
+		}
+
 		/* Ask lmp for the current discovery log */
 		discoveries = irlmp_get_discoveries(&list.len, self->mask.word,
 						    self->nslots);
@@ -2290,15 +2300,9 @@ static int irda_getsockopt(struct socket *sock, int level, int optname,
 		}
 
 		/* Write total list length back to client */
-		if (copy_to_user(optval, &list,
-				 sizeof(struct irda_device_list) -
-				 sizeof(struct irda_device_info)))
+		if (copy_to_user(optval, &list, offset))
 			err = -EFAULT;
 
-		/* Offset to first device entry */
-		offset = sizeof(struct irda_device_list) -
-			sizeof(struct irda_device_info);
-
 		/* Copy the list itself - watch for overflow */
 		if (list.len > 2048) {
 			err = -EINVAL;



^ permalink raw reply related

* RE: [PATCH 01/10] bna: TxRx and datapath fix
From: Rasesh Mody @ 2010-12-22 23:44 UTC (permalink / raw)
  To: David Miller; +Cc: netdev@vger.kernel.org, Debashis Dutt
In-Reply-To: <20101222.151213.115925760.davem@davemloft.net>

Hi David

>From: David Miller [mailto:davem@davemloft.net]
>Sent: Wednesday, December 22, 2010 3:12 PM
>
>From: Rasesh Mody <rmody@brocade.com>
>Date: Tue, 21 Dec 2010 23:23:07 -0800
>
>>  #define bnad_dim_timer_running(_bnad)				\
>>  	(((_bnad)->cfg_flags & BNAD_CF_DIM_ENABLED) && 		\
>>  	(test_bit(BNAD_RF_DIM_TIMER_RUNNING, &((_bnad)->run_flags))))
>>
>> +#if defined(__ia64__)
>> +#define bnad_udelay	udelay
>> +#else
>> +#define bnad_udelay	__udelay
>> +#endif
>> +
>
>What in the world is this?
>
>Something like this should absolutely, positively, never be necessary.

We will send a corrective patch to fix this.

Thanks,
Rasesh Mody

^ permalink raw reply

* Re: TSO/GRO/LRO/somethingO breaks LVS on 2.6.36
From: Simon Kirby @ 2010-12-22 23:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Simon Horman, netdev
In-Reply-To: <1291863119.2795.29.camel@edumazet-laptop>

On Thu, Dec 09, 2010 at 03:51:59AM +0100, Eric Dumazet wrote:

> Le mercredi 08 d??cembre 2010 ?? 18:35 -0800, Simon Kirby a ??crit :
> 
> > Right.  So, on 2.6.35, ethtool shows:
> > 
> > # ethtool -k eth1
> > Offload parameters for eth1:
> > rx-checksumming: on
> > tx-checksumming: on
> > scatter-gather: on
> > tcp segmentation offload: on
> > udp fragmentation offload: off
> > generic segmentation offload: on
> > large receive offload: off
> > 
> > # ethtool -k eth1.39
> > Offload parameters for eth1.39:
> > rx-checksumming: on
> > tx-checksumming: off
> > scatter-gather: off
> > tcp segmentation offload: off
> > udp fragmentation offload: off
> > generic segmentation offload: off
> > large receive offload: off
> > 
> > On 2.6.36, ethtool shows:
> > 
> > # ethtool -k eth1
> > Offload parameters for eth1:
> > rx-checksumming: on
> > tx-checksumming: on
> > scatter-gather: on
> > tcp segmentation offload: on
> > udp fragmentation offload: off
> > generic segmentation offload: on
> > large receive offload: off
> > 
> > # ethtool -k eth1.39
> > Offload parameters for eth1.39:
> > rx-checksumming: on
> > tx-checksumming: on
> > scatter-gather: on
> > tcp segmentation offload: on
> > udp fragmentation offload: off
> > generic segmentation offload: on
> > large receive offload: off
> > 
> > And if I set it with ethtool -K eth1.39 gso off; ethtool -K eth1 gso off,
> > it says "off", but I still see merged frames, as verified with an nc <
> > /dev/zero and tcpdump -i any -n 'length > 1500'.
> 
> Yep, but you also have "tso on" by default on 2.6.36 and eth1.39 (vlan)

and turning it off makes no difference, either.  Even if I down/up the
interface after.  I still see merged packets on receive, even if I try to
use ethtool to turn off everything that is "on".  It's as if the new code
to enable it on vlans works, but the disable from ethtool does not.

In other words, I cannot find a way to disable it (and thus make LVS
work) on 2.6.36 or 2.6.37-rc6.

One interesting note is that if I destroy and recreate the VLAN interface,
it comes back with gso enabled:

# ethtool -K eth1 gso off
# ethtool -K eth1.39 gso off
# ethtool -K eth1 tso off
# ethtool -K eth1.39 tso off
# ifdown eth1.39
Removed VLAN -:eth1.39:-
# ifconfig eth1 down
# ifup eth1.39
Added VLAN with VID == 39 to IF -:eth1:-
# ethtool -k eth1
Offload parameters for eth1:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off
large receive offload: off
# ethtool -k eth1.39
Offload parameters for eth1.39:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: on
large receive offload: off
# ethtool -K eth1.39 gso off
# ethtool -k eth1.39
Offload parameters for eth1.39:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: off
large receive offload: off
# ifdown eth1.39; ifup eth1.39
Removed VLAN -:eth1.39:-
Added VLAN with VID == 39 to IF -:eth1:-
# ethtool -k eth1.39
Offload parameters for eth1.39:
rx-checksumming: on
tx-checksumming: on
scatter-gather: on
tcp segmentation offload: off
udp fragmentation offload: off
generic segmentation offload: on
large receive offload: off

So, the bug seems to related using it on VLANs.

Simon-

> > > I will prepare a backport of the "ipvs: allow transmit of GRO aggregated
> > > skbs" patch to v2.6.36 and post it shortly.  Testing to see if that
> > > resolves the problem that you are seeing would probably be a good start.
> > 
> > I can test this in our case, and it should specifically work on the
> > servers we were seeing the problem on, but it still needs some
> > complicated logic to be safe for all cases (like when LVS decides to
> > route out an interface without GRO).  Anyway, thanks!
> > 
> > Simon-
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] tcp: use RTAX_CWND for outgoing connections properly
From: Jiri Kosina @ 2010-12-22 23:33 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, eric.dumazet, linux-kernel, vojtech
In-Reply-To: <20101222.152743.28800942.davem@davemloft.net>

On Wed, 22 Dec 2010, David Miller wrote:

> >> > Besides that, any comments on this, please?
> >> 
> >> It's in my backlog, patience is a virtue.
> > 
> > Please just disregard it, it's not needed with newer kernels. Thanks,
> 
> Was it indeed fixed by commit 86bcebafc5e7f5?

Yes, exactly. It just wasn't obvious enough when looking at the code 
(hence my followup patch which hopefully makes the code more readable on 
a first sight).

Thanks and sorry for the noise.

-- 
Jiri Kosina
SUSE Labs, Novell Inc.

^ permalink raw reply

* Re: [PATCH] tcp: use RTAX_CWND for outgoing connections properly
From: David Miller @ 2010-12-22 23:27 UTC (permalink / raw)
  To: jkosina; +Cc: netdev, eric.dumazet, linux-kernel, vojtech
In-Reply-To: <alpine.LNX.2.00.1012230023490.16569@pobox.suse.cz>

From: Jiri Kosina <jkosina@suse.cz>
Date: Thu, 23 Dec 2010 00:26:21 +0100 (CET)

> On Wed, 22 Dec 2010, David Miller wrote:
> 
>> > Besides that, any comments on this, please?
>> 
>> It's in my backlog, patience is a virtue.
> 
> Please just disregard it, it's not needed with newer kernels. Thanks,

Was it indeed fixed by commit 86bcebafc5e7f5?

^ permalink raw reply

* Re: [PATCH] pch_can: Fix array miss-pointing issue
From: David Miller @ 2010-12-22 23:26 UTC (permalink / raw)
  To: tomoya-linux-ECg8zkTtlr0C6LszWs/t0g
  Cc: andrew.chih.howe.khor-ral2JQCrhuEAvxtiuMwx3w,
	qi.wang-ral2JQCrhuEAvxtiuMwx3w, netdev-u79uwXL29TY76Z2rM5mHXA,
	yong.y.wang-ral2JQCrhuEAvxtiuMwx3w,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	socketcan-core-0fE9KPoRgkgATYTw5x5z8w,
	kok.howg.ewe-ral2JQCrhuEAvxtiuMwx3w,
	joel.clark-ral2JQCrhuEAvxtiuMwx3w, wg-5Yr1BZd7O62+XT7JhA+gdA
In-Reply-To: <1293022839-3202-1-git-send-email-tomoya-linux-ECg8zkTtlr0C6LszWs/t0g@public.gmane.org>

From: Tomoya MORINAGA <tomoya-linux-ECg8zkTtlr0C6LszWs/t0g@public.gmane.org>
Date: Wed, 22 Dec 2010 22:00:39 +0900

> 
> Signed-off-by: Tomoya MORINAGA <tomoya-linux-ECg8zkTtlr0C6LszWs/t0g@public.gmane.org>

Applied.

^ permalink raw reply

* Re: [PATCH] tcp: use RTAX_CWND for outgoing connections properly
From: Jiri Kosina @ 2010-12-22 23:26 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, eric.dumazet, linux-kernel, Vojtech Pavlik
In-Reply-To: <20101222.115721.39176015.davem@davemloft.net>

On Wed, 22 Dec 2010, David Miller wrote:

> > Besides that, any comments on this, please?
> 
> It's in my backlog, patience is a virtue.

Please just disregard it, it's not needed with newer kernels. Thanks,

-- 
Jiri Kosina
SUSE Labs, Novell Inc.

^ permalink raw reply

* Re: [PATCH] atl1c: Do not use legacy PCI power management
From: David Miller @ 2010-12-22 23:26 UTC (permalink / raw)
  To: rjw; +Cc: netdev, jcliburn, chris.snook, jie.yang
In-Reply-To: <201012221407.52615.rjw@sisk.pl>

From: "Rafael J. Wysocki" <rjw@sisk.pl>
Date: Wed, 22 Dec 2010 14:07:52 +0100

> From: Rafael J. Wysocki <rjw@sisk.pl>
> 
> The atl1c driver uses the legacy PCI power management, so it has to
> do some PCI-specific things in its ->suspend() and ->resume()
> callbacks and they are not done correctly.
> 
> Convert atl1c to the new PCI power management framework and make it
> let the PCI subsystem handle all of the PCI-specific aspects of
> device handling during system power transitions.
> 
> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Applied to net-2.6, thank you.

^ permalink raw reply

* Re: [PATCH v6] kptr_restrict for hiding kernel pointers
From: Joe Perches @ 2010-12-22 23:13 UTC (permalink / raw)
  To: Dan Rosenberg
  Cc: linux-kernel, netdev, linux-security-module, jmorris,
	eric.dumazet, tgraf, eugeneteo, kees.cook, mingo, davem,
	a.p.zijlstra, akpm, eparis
In-Reply-To: <1293058439.9820.317.camel@dan>

On Wed, 2010-12-22 at 17:53 -0500, Dan Rosenberg wrote:
> Add the %pK printk format specifier and
> the /proc/sys/kernel/kptr_restrict sysctl.

trivial comments.

> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
[]
> @@ -201,6 +201,8 @@ extern int sscanf(const char *, const char *, ...)
>  extern int vsscanf(const char *, const char *, va_list)
>  	__attribute__ ((format (scanf, 2, 0)));
>  
> +extern int kptr_restrict;	/* for sysctl */

I think this extern should go into printk.h

> diff --git a/lib/vsprintf.c b/lib/vsprintf.c
[]
> @@ -1035,6 +1038,30 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
>  		return buf + vsnprintf(buf, end - buf,
>  				       ((struct va_format *)ptr)->fmt,
>  				       *(((struct va_format *)ptr)->va));
> +	case 'K':
[]
> +		else if (!kptr_restrict)
> +			break;		/* %pK does not obscure pointers */
> +
> +		else if ((kptr_restrict != 2) &&
> +		    has_capability_noaudit(current, CAP_SYSLOG))
> +			break;		/* privileged apps expose pointers,
> +					   unless kptr_restrict is 2 */
> +

I think this more readable as:

		else if ((kptr_restrict == 0)) ||
			 (kptr_restrict == 1 &&
			  has_capability_noaudit(current, CAP_SYSLOG)))
			break;



^ permalink raw reply

* Re: [PATCH 01/10] bna: TxRx and datapath fix
From: David Miller @ 2010-12-22 23:12 UTC (permalink / raw)
  To: rmody; +Cc: netdev, ddutt
In-Reply-To: <1293002596-3239-2-git-send-email-rmody@brocade.com>

From: Rasesh Mody <rmody@brocade.com>
Date: Tue, 21 Dec 2010 23:23:07 -0800

>  #define bnad_dim_timer_running(_bnad)				\
>  	(((_bnad)->cfg_flags & BNAD_CF_DIM_ENABLED) && 		\
>  	(test_bit(BNAD_RF_DIM_TIMER_RUNNING, &((_bnad)->run_flags))))
>  
> +#if defined(__ia64__)
> +#define bnad_udelay	udelay
> +#else
> +#define bnad_udelay	__udelay
> +#endif
> +

What in the world is this?

Something like this should absolutely, positively, never be necessary.


^ permalink raw reply

* [PATCH v6] kptr_restrict for hiding kernel pointers
From: Dan Rosenberg @ 2010-12-22 22:53 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-security-module
  Cc: jmorris, eric.dumazet, tgraf, eugeneteo, kees.cook, mingo, davem,
	a.p.zijlstra, akpm, eparis

Add the %pK printk format specifier and
the /proc/sys/kernel/kptr_restrict sysctl.

The %pK format specifier is designed to hide exposed kernel pointers,
specifically via /proc interfaces. Exposing these pointers provides an
easy target for kernel write vulnerabilities, since they reveal the
locations of writable structures containing easily triggerable function
pointers. The behavior of %pK depends on the kptr_restrict sysctl.

If kptr_restrict is set to 0, no deviation from the standard %p behavior
occurs.  If kptr_restrict is set to 1, the default, if the current user
(intended to be a reader via seq_printf(), etc.) does not have
CAP_SYSLOG (currently in the LSM tree), kernel pointers using %pK are
printed as 0's.  If kptr_restrict is set to 2, kernel pointers using %pK
are printed as 0's regardless of privileges.  Replacing with 0's was
chosen over the default "(null)", which cannot be parsed by userland %p,
which expects "(nil)".


v6 removes the WARN_ONCE in favor of returning "pK-error" to avoid
breaking in certain cases, thanks to Ingo Molnar.

v5 sets kptr_restrict to a default value of 1, and properly handles the
case where it's incorrectly used in IRQ context. 

v4 incorporates Eric Paris' suggestion of using
has_capability_noaudit(), since failing this capability check is not a
policy violation but rather a code path choice and shouldn't generate
potentially excessive log noise.  Adjusted IRQ comment for clarity.

v3 adds the "2" setting, cleans up documentation, removes the CONFIG,
and incorporates changes and suggestions from Andrew Morton.

v2 improves checking for inappropriate context, on suggestion by Peter
Zijlstra.  Thanks to Thomas Graf for suggesting use of a centralized
format specifier.


Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Cc: James Morris <jmorris@namei.org>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Thomas Graf <tgraf@infradead.org>
Cc: Eugene Teo <eugeneteo@kernel.org>
Cc: Kees Cook <kees.cook@canonical.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David S. Miller <davem@davemloft.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Eric Paris <eparis@parisplace.org>
---
 Documentation/sysctl/kernel.txt |   14 ++++++++++++++
 include/linux/kernel.h          |    2 ++
 kernel/sysctl.c                 |    9 +++++++++
 lib/vsprintf.c                  |   27 +++++++++++++++++++++++++++
 4 files changed, 52 insertions(+), 0 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 209e158..8ace8c4 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -34,6 +34,7 @@ show up in /proc/sys/kernel:
 - hotplug
 - java-appletviewer           [ binfmt_java, obsolete ]
 - java-interpreter            [ binfmt_java, obsolete ]
+- kptr_restrict
 - kstack_depth_to_print       [ X86 only ]
 - l2cr                        [ PPC only ]
 - modprobe                    ==> Documentation/debugging-modules.txt
@@ -261,6 +262,19 @@ This flag controls the L2 cache of G3 processor boards. If
 
 ==============================================================
 
+kptr_restrict:
+
+This toggle indicates whether restrictions are placed on
+exposing kernel addresses via /proc and other interfaces.  When
+kptr_restrict is set to (0), there are no restrictions.  When
+kptr_restrict is set to (1), the default, kernel pointers
+printed using the %pK format specifier will be replaced with 0's
+unless the user has CAP_SYSLOG.  When kptr_restrict is set to
+(2), kernel pointers printed using %pK will be replaced with 0's
+regardless of privileges.
+
+==============================================================
+
 kstack_depth_to_print: (X86 only)
 
 Controls the number of words to print when dumping the raw
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index b6de9a6..b4f4863 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -201,6 +201,8 @@ extern int sscanf(const char *, const char *, ...)
 extern int vsscanf(const char *, const char *, va_list)
 	__attribute__ ((format (scanf, 2, 0)));
 
+extern int kptr_restrict;	/* for sysctl */
+
 extern int get_option(char **str, int *pint);
 extern char *get_options(const char *str, int nints, int *ints);
 extern unsigned long long memparse(const char *ptr, char **retptr);
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 5abfa15..236fa91 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -713,6 +713,15 @@ static struct ctl_table kern_table[] = {
 	},
 #endif
 	{
+		.procname	= "kptr_restrict",
+		.data		= &kptr_restrict,
+		.maxlen		= sizeof(int),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= &zero,
+		.extra2		= &two,
+	},
+	{
 		.procname	= "ngroups_max",
 		.data		= &ngroups_max,
 		.maxlen		= sizeof (int),
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index c150d3d..f0fcb31 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -936,6 +936,8 @@ char *uuid_string(char *buf, char *end, const u8 *addr,
 	return string(buf, end, uuid, spec);
 }
 
+int kptr_restrict = 1;
+
 /*
  * Show a '%p' thing.  A kernel extension is that the '%p' is followed
  * by an extra set of alphanumeric characters that are extended format
@@ -979,6 +981,7 @@ char *uuid_string(char *buf, char *end, const u8 *addr,
  *       Implements a "recursive vsnprintf".
  *       Do not use this feature without some mechanism to verify the
  *       correctness of the format string and va_list arguments.
+ * - 'K' For a kernel pointer that should be hidden from unprivileged users
  *
  * Note: The difference between 'S' and 'F' is that on ia64 and ppc64
  * function pointers are really function descriptors, which contain a
@@ -1035,6 +1038,30 @@ char *pointer(const char *fmt, char *buf, char *end, void *ptr,
 		return buf + vsnprintf(buf, end - buf,
 				       ((struct va_format *)ptr)->fmt,
 				       *(((struct va_format *)ptr)->va));
+	case 'K':
+		/*
+		 * %pK cannot be used in IRQ context because its test
+		 * for CAP_SYSLOG would be meaningless.
+		 */
+		if (in_irq() || in_serving_softirq() || in_nmi()) {
+			if (spec.field_width == -1)
+				spec.field_width = 2 * sizeof(void *);
+			return string(buf, end, "pK-error", spec);
+		}
+
+		else if (!kptr_restrict)
+			break;		/* %pK does not obscure pointers */
+
+		else if ((kptr_restrict != 2) &&
+		    has_capability_noaudit(current, CAP_SYSLOG))
+			break;		/* privileged apps expose pointers,
+					   unless kptr_restrict is 2 */
+
+		if (spec.field_width == -1) {
+			spec.field_width = 2 * sizeof(void *);
+			spec.flags |= ZEROPAD;
+		}
+		return number(buf, end, 0, spec);
 	}
 	spec.flags |= SMALL;
 	if (spec.field_width == -1) {





^ permalink raw reply related

* Re: [PATCH v5] kptr_restrict for hiding kernel pointers
From: Valdis.Kletnieks @ 2010-12-22 21:43 UTC (permalink / raw)
  To: Dan Rosenberg
  Cc: Ingo Molnar, linux-kernel, netdev, linux-security-module, jmorris,
	eric.dumazet, tgraf, eugeneteo, kees.cook, davem, a.p.zijlstra,
	akpm, eparis
In-Reply-To: <1293038279.9820.250.camel@dan>

[-- Attachment #1: Type: text/plain, Size: 1183 bytes --]

On Wed, 22 Dec 2010 12:17:59 EST, Dan Rosenberg said:
> On Wed, 2010-12-22 at 18:13 +0100, Ingo Molnar wrote:
> > * Dan Rosenberg <drosenberg@vsecurity.com> wrote:
> > 
> > > +	case 'K':
> > > +		/*
> > > +		 * %pK cannot be used in IRQ context because its test
> > > +		 * for CAP_SYSLOG would be meaningless.
> > > +		 */
> > > +		if (in_irq() || in_serving_softirq() || in_nmi())
> > > +			WARN_ONCE(1, "%%pK used in interrupt context.\n");
> > 
> > Hm, that bit looks possibly broken - some useful warning in irq context could print
> > a pointer into the syslog and this would generate a second warning? That probably
> > would crash as it recurses back into the printk code?

> The double "%%" acts as an escape and simply prints "%" rather than
> treating it as a format specifier.

I think Ingo was more worried about the fact that we're doing a WARN_ONCE which
will generate a call to printk() - while we're in the middle of a printk() already.

So if we hit a 'printk(KERN_INFO "Some blather with a %pK pointer in it",ptr) in irq
context, what we'll get (if we're lucky is:

Some blather with a <50-60 lines of WARN_ONCE output> pointer in it.

If we're unlucky? Well... 


[-- Attachment #2: Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply

* Re: [PATCH v5] kptr_restrict for hiding kernel pointers
From: Ingo Molnar @ 2010-12-22 21:26 UTC (permalink / raw)
  To: Dan Rosenberg
  Cc: linux-kernel, netdev, linux-security-module, jmorris,
	eric.dumazet, tgraf, eugeneteo, kees.cook, davem, a.p.zijlstra,
	akpm, eparis
In-Reply-To: <1293039332.9820.262.camel@dan>


* Dan Rosenberg <drosenberg@vsecurity.com> wrote:

> On Wed, 2010-12-22 at 18:13 +0100, Ingo Molnar wrote:
> > * Dan Rosenberg <drosenberg@vsecurity.com> wrote:
> > 
> > > +	case 'K':
> > > +		/*
> > > +		 * %pK cannot be used in IRQ context because its test
> > > +		 * for CAP_SYSLOG would be meaningless.
> > > +		 */
> > > +		if (in_irq() || in_serving_softirq() || in_nmi())
> > > +			WARN_ONCE(1, "%%pK used in interrupt context.\n");
> > 
> > Hm, that bit looks possibly broken - some useful warning in irq context could print 
> > a pointer into the syslog and this would generate a second warning? That probably 
> > would crash as it recurses back into the printk code?
> > 
> 
> I don't see a reason to ever use %pK to print to the syslog, since
> reading it is now optionally protected with dmesg_restrict, and
> stripping pointers from the syslog will cripple any post-mortem
> debugging for everyone.  I understand the desire to prevent things from
> breaking even if it's used incorrectly, but I'm not really convinced
> that this would break anything even in this scenario.  The WARN_ONCE
> will prevent any unbounded recursion.  I'm just not clear on how this
> could cause a crash.

It's a simple QOI issue. We simply do not add kernel facilities that can produce a 
stack overflow, memory corruption and triple fault if a rare debug statement 
triggers in an IRQ context by accident:

	printk(KERN_WARN "driver bar: bug foo in function %pK\n");

> > Instead a warning could be inserted into the generated output instead, for 
> > example 'pK-error' (carefully staying within pointer length limits).
> 
> If it's used in IRQ context and its output needs to be read by a
> userspace utility using %p to parse, this will break it.

Didnt you just say that it should not be used from IRQ context? There wont be any 
user-space tool to read it - it's a simple robustness change: the warning as you 
implemented it can crash the system. I suggested an implementation that would emit 
the warning in a more robust way.

Thanks,

	Ingo

^ permalink raw reply

* Re: [PATCH v4] kptr_restrict for hiding kernel pointers
From: Andrew Morton @ 2010-12-22 20:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Eric Dumazet, Dan Rosenberg, linux-kernel, netdev,
	linux-security-module, jmorris, tgraf, eugeneteo, kees.cook,
	davem, a.p.zijlstra, eparis, Linus Torvalds
In-Reply-To: <20101222162118.GC20358@elte.hu>

On Wed, 22 Dec 2010 17:21:18 +0100
Ingo Molnar <mingo@elte.hu> wrote:

> 
> * Eric Dumazet <eric.dumazet@gmail.com> wrote:
> 
> > Le mercredi 22 d__cembre 2010 __ 08:13 -0500, Dan Rosenberg a __crit :
> > > > Hm, why is it off by default? Is there some user-space regression that is caused by 
> > > > this?
> > > > 
> > > > We really want good security measures to be active by default (and to work by 
> > > > default) - they are not worth much if they are not.
> > > > 
> > > 
> > > I agree entirely, but I've received a lot of resistance to these types
> > > of changes in net.  I'm afraid that if it's enabled by default, no one
> > > will actually allow use of the %pK specifier where it should be used.
> > > 
> > 
> > Actually, "net resistance" was against your first patches, using quick
> > and dirty techniques (Should I remind you some of them ?)
> > 
> > Now you have a helper, it should be easier to integrate the changes.
> 
> Great - if the concept itself wasnt objected to then i think we should flip the 
> default to on.
> 

Yes, I'll make that change.  If we get reports of breakage during
2.6.38-rcX then we can reconsider.  But if we leave the feature
disabled, we will never know...  

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox