All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC][PATCH 0/5] memcg: reduce lock conetion
From: KAMEZAWA Hiroyuki @ 2009-08-28  4:20 UTC (permalink / raw)
  To: linux-mm@kvack.org
  Cc: linux-kernel@vger.kernel.org, balbir@linux.vnet.ibm.com,
	nishimura@mxp.nes.nec.co.jp

Hi,

Recently, memcg's res_counter->lock contention on big server is reported and
Balbir wrote a workaround for root memcg.
It's good but we need some fix for children, too.

This set is for reducing lock conetion of memcg's children cgroup based on mmotm-Aug27.

I'm sorry I have only 8cpu machine and can't reproduce very troublesome lock conention.
Here is lock_stat of make -j 12 on my 8cpu box, befre-after this patch series.

[Before] time make -j 12 (Best time in 3 test)
real    2m55.170s
user    4m38.351s
sys     6m40.694s
lock_stat version 0.3
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                              class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                          &counter->lock:       1793728        1824383           0.90       16599.78     1255869.40       24879507       44909568           0.45       31183.88    19505982.15
                          --------------
                          &counter->lock         999561          [<ffffffff81099224>] res_counter_charge+0x94/0x140
                          &counter->lock         824822          [<ffffffff8109911c>] res_counter_uncharge+0x3c/0xb0
                          --------------
                          &counter->lock         835597          [<ffffffff8109911c>] res_counter_uncharge+0x3c/0xb0
                          &counter->lock         988786          [<ffffffff81099224>] res_counter_charge+0x94/0x140

you can see this by "head" ;)

[After] time make -j 12 (Best time in 3 test..but score was very stable.)
real    2m52.612s
user    4m45.450s
sys     6m4.422s

                          &counter->lock:         11159          11406           1.02          30.35        6707.74        1097940        3957860           0.47       17652.17     1534430.74
                          --------------
                          &counter->lock           2016          [<ffffffff810991bd>] res_counter_charge+0x4d/0x110
                          &counter->lock           9390          [<ffffffff81099115>] res_counter_uncharge+0x35/0x90
                          --------------
                          &counter->lock           8962          [<ffffffff81099115>] res_counter_uncharge+0x35/0x90
                          &counter->lock           2444          [<ffffffff810991bd>] res_counter_charge+0x4d/0x110

dcache-lock, zone->lru_lock etc is much heavier than this.


I expects good result on big servers.

But this patch sereis is a  "big change". I (and memcg folks) have to be careful...


Thanks,
-Kame





^ permalink raw reply

* [RFC][PATCH 0/5] memcg: reduce lock conetion
From: KAMEZAWA Hiroyuki @ 2009-08-28  4:20 UTC (permalink / raw)
  To: linux-mm@kvack.org
  Cc: linux-kernel@vger.kernel.org, balbir@linux.vnet.ibm.com,
	nishimura@mxp.nes.nec.co.jp

Hi,

Recently, memcg's res_counter->lock contention on big server is reported and
Balbir wrote a workaround for root memcg.
It's good but we need some fix for children, too.

This set is for reducing lock conetion of memcg's children cgroup based on mmotm-Aug27.

I'm sorry I have only 8cpu machine and can't reproduce very troublesome lock conention.
Here is lock_stat of make -j 12 on my 8cpu box, befre-after this patch series.

[Before] time make -j 12 (Best time in 3 test)
real    2m55.170s
user    4m38.351s
sys     6m40.694s
lock_stat version 0.3
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                              class name    con-bounces    contentions   waittime-min   waittime-max waittime-total    acq-bounces   acquisitions   holdtime-min   holdtime-max holdtime-total
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

                          &counter->lock:       1793728        1824383           0.90       16599.78     1255869.40       24879507       44909568           0.45       31183.88    19505982.15
                          --------------
                          &counter->lock         999561          [<ffffffff81099224>] res_counter_charge+0x94/0x140
                          &counter->lock         824822          [<ffffffff8109911c>] res_counter_uncharge+0x3c/0xb0
                          --------------
                          &counter->lock         835597          [<ffffffff8109911c>] res_counter_uncharge+0x3c/0xb0
                          &counter->lock         988786          [<ffffffff81099224>] res_counter_charge+0x94/0x140

you can see this by "head" ;)

[After] time make -j 12 (Best time in 3 test..but score was very stable.)
real    2m52.612s
user    4m45.450s
sys     6m4.422s

                          &counter->lock:         11159          11406           1.02          30.35        6707.74        1097940        3957860           0.47       17652.17     1534430.74
                          --------------
                          &counter->lock           2016          [<ffffffff810991bd>] res_counter_charge+0x4d/0x110
                          &counter->lock           9390          [<ffffffff81099115>] res_counter_uncharge+0x35/0x90
                          --------------
                          &counter->lock           8962          [<ffffffff81099115>] res_counter_uncharge+0x35/0x90
                          &counter->lock           2444          [<ffffffff810991bd>] res_counter_charge+0x4d/0x110

dcache-lock, zone->lru_lock etc is much heavier than this.


I expects good result on big servers.

But this patch sereis is a  "big change". I (and memcg folks) have to be careful...


Thanks,
-Kame




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [ath5k-devel] [PATCH 1/2] ath5k: fix uninitialized value use in ath5k_eeprom_read_turbo_modes()
From: Nick Kossifidis @ 2009-08-28  4:17 UTC (permalink / raw)
  To: Luis R. Rodriguez
  Cc: Bob Copeland, Pavel Roskin, ath5k-devel, linux-wireless,
	John W. Linville
In-Reply-To: <43e72e890908272057s28ee11c6x624e8a08c6bd17e9@mail.gmail.com>

2009/8/28 Luis R. Rodriguez <mcgrof@gmail.com>:
> On Thu, Aug 27, 2009 at 8:01 PM, Nick Kossifidis<mickflemm@gmail.com> wrote:
>> 2009/8/27 Luis R. Rodriguez <mcgrof@gmail.com>:
>>> On Thu, Aug 27, 2009 at 11:17 AM, Bob Copeland<bcopeland@gmail.com> wrote:
>>>> On Thu, Aug 27, 2009 at 8:58 AM, Nick Kossifidis<mickflemm@gmail.com> wrote:
>>>>> 2009/8/27 Pavel Roskin <proski@gnu.org>:
>>>>
>>>>> Current code works fine (i 've checked it against various cards),
>>>>> there is nothing wrong
>>>>> with having another function for reading turbo modes, i find it's
>>>>> cleaner that way.
>>>>
>>>> Well, we also don't use the turbo modes at all and that's where the
>>>> error is (IIRC) so it shouldn't have any impact. :)
>>>
>>> Again, why don't we just remove all that fucking turbo cruft?
>>>
>>>  Luis
>>>
>>
>> Why should we remove it, we are discussing on implementing channel
>> width setting for 5 and 10 MHz channels already so where is the
>> problem supporting turbo mode (40MHz) ?
>
> Supporting 5 MHz and 10 MHz channels is very different than supporting
> Turbo (40 MHz). 5 MHz and 10 MHz channels seems to be something you
> can use as per 802.11, 40 MHz "Turbo" stuff is just a vendor extension
> and at least by my part I don't want to move a finger to either
> support it nor do I think its a good idea to support it. Other people
> have objected to vendor extensions before on mac80211 so I don't think
> you'll find much support for this from a lot of people.
>

Many people use turbo mode and it's not an ugly proprietary extension, static
turbo mode is close to just having 40MHz channels, we can use the same way to
switch to it as with 5 and 10MHz channels. The difficult part is
having dynamic turbo
(supporting 20MHz and 40MHz stations at the same time) but we don't deal with it
anyway (and it's only useful on ap mode).

Most code is there, we are ready to support 5/10/40MHz channels on the
driver part
as soon as we are done with cfg80211/mac80211 compatibility so why drop it ?

> The way I see is if you want vendor extensions like Atheros Turbo or
> XR use MadWifi.
>

XR is not supported on MadWiFi, i remember only some parts were supported +
we don't have much to work with anyway (XR code is missing from Legacy and
Sam's HAL) + not many people use it anyway so i agree we can drop it but it's
nothing like turbo mode, as i said many people use that.

>> Also EEPROM code should read the eeprom and fill the structs, since
>> these infos are there we should read them, i don't see any reason to
>> skip them
>
> I didn't see Bob's patch remove that stuff. Its pointless to use it though.
>

It can be confusing (offsets missing etc) + we might want to put eeprom infos
on debugfs (i had an ugly patch just for that) for debuging.

>> i thought our goal was to support this hw as much as
>> possible,
>
> We should support users as best as possible, whether or not you
> support vendor extensions is an entirely different story.
>
>> if we want to get rid of MadWiFi we 'll have to at least
>> support 5, 10 and 40MHz (turbo) channels.
>
> I don't want to get rid of MadWifi, what we have now is a proper
> upstream replacement. MadWifi is still a hack put together, and people
> who want hacks can use that.
>

Having multiple drivers won't help users, i thought that MadWiFi was "dead"
and we were working on a complete alternative.



-- 
GPG ID: 0xD21DB2DB
As you read this post global entropy rises. Have Fun ;-)
Nick

^ permalink raw reply

* Re: xfs data loss
From: Eric Sandeen @ 2009-08-28  4:16 UTC (permalink / raw)
  To: Passerone, Daniele; +Cc: xfs@oss.sgi.com
In-Reply-To: <B9A7B002C7FAFC469D4229539E909760308DA651DE@DU-EXC-MAIL.empa.emp-eaw.ch>

Passerone, Daniele wrote:
> Dear xfs developers
> 
> We have a SUN X4500 with 48 500 GB drives, that we configured under SUSE SLES 10.
> 
> Among others, we have 3 RAID5 xfs filesystems, /dev/md4 with 20 units (9.27 TB)
> /dev/md5 with 20 units (9.27 TB) and /dev/md6 with 5 units (1.95 TB)
> 
> These units are not backed up.
> 
> Due to a power shock, suddenly and without log messages about one half (5 TB) of the user 
> directories on /dev/md4 have disappeared.

I presume you mean after a reboot?

> Upon reboot, /dev/md6 showed only 3 units, and after a xfs_repair it was again ok.
> /dev/md4 mounted immediately, but always with one half of the directories.

Were the lost directories recently created?  I've never heard of 
untouched, existing directories disappearing after a power loss...

> WHat can I do? Any help would be appreciated, I would really be happy to recover those files...
> :)

Not much to go on here I'm afraid.  SLES10 is an old kernel, but it's 
supported by SuSE at least.

-Eric

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

^ permalink raw reply

* Re: [PATCH]: sparc64: Validate linear D-TLB misses.
From: David Miller @ 2009-08-28  4:14 UTC (permalink / raw)
  To: sparclinux
In-Reply-To: <20090824.232349.236950384.davem@davemloft.net>

From: Jim Gifford <maillist@jg555.com>
Date: Thu, 27 Aug 2009 17:18:39 -0700

> Spoke too soon, it's started again. So that patch has only delayed the
> issue.

It wasn't meant to fix your bug so this is no surprise
to me.

^ permalink raw reply

* Re: [BUG] lockup with the latest kernel
From: Linus Torvalds @ 2009-08-28  4:05 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Steven Rostedt, LKML, Thomas Gleixner, Peter Zijlstra,
	Ingo Molnar, Andrew Morton
In-Reply-To: <4A9744F4.7010208@kernel.org>



On Fri, 28 Aug 2009, Tejun Heo wrote:
> > 
> >     x86: make x86_32 use tlb_64.c
> >     
> >     Impact: less contention when issuing invalidate IPI, cleanup
> >     
> >     Make x86_32 use the same tlb code as 64bit.  The 64bit code uses
> >     multiple IPI vectors for tlb shootdown to reduce contention.  This
> >     patch makes x86_32 allocate the same 8 IPIs as x86_64 and share the
> >     code paths.
> >     
> >     Note that the usage of asmlinkage is inconsistent for x86_32 and 64
> >     and calls for further cleanup.  This has been noted with a FIXME
> >     comment in tlb_64.c.
> >     
> >     Signed-off-by: Tejun Heo <tj@kernel.org>
> > 
> > I can easily hit this bug at this commit, but I ran for a week on the 
> > commit before it. Thus I'm assuming this is the bug (but I'm not 100% 
> > sure).
> 
> Drat, why does it have to be mine?  ;-)
> 
> Joke aside, thank you very much for bisecting it.
> 
> ...
> >> [13288.222084] EIP: 0060:[<c0110821>] EFLAGS: 00000002 CPU: 0
> >> [13288.222084] EIP is at default_send_IPI_mask_logical+0x53/0x92

Is this one perhaps fixed by b04e6373d694 ("x86: don't call 
'->send_IPI_mask()' with an empty mask")

It sounds a _lot_ like that bug. Older dual-cpu x86 box, and APIC getting 
confused by the occasional empty CPU mask, and then subsequent IPI's will 
hang.

		Linus

^ permalink raw reply

* [Qemu-devel] support for dual cortex - A9 ??
From: Neo Techie @ 2009-08-28  4:06 UTC (permalink / raw)
  To: qemu-devel

[-- Attachment #1: Type: text/plain, Size: 281 bytes --]

I wanted to use qemu to emulate a dual cortex-A9 ARM processor

Processor Family:-  Cortex
Architecture version:- ARMv7-A
Core :- dual cortex-A9

Please let me know if qemu supports all the instruction set of this processor. If not, how much of it is supported

thank you



      

[-- Attachment #2: Type: text/html, Size: 501 bytes --]

^ permalink raw reply

* [ath9k-devel] new rfkill causes intermittent transmission
From: Kevin Mitchell @ 2009-08-28  4:03 UTC (permalink / raw)
  To: ath9k-devel

I am running ath9k on an ar5418 in a Thinkpad T60. My AP is a Dlink
DIR-655. I have bisected this regression down to the big rfkill
rewrite 19d337dff95cbf76edd3ad95c0cee2732c3e1ec5. Starting with this
commit, I find that transmitting and receiving cycle through
working/not working at ~1HZ. As this is a rather large commit, I have
further pinpointed the problem down to the call to ath_radio_enable()
in ath9k_rfkill_poll_state(). In particular if i comment out

	if (blocked)
		ath_radio_disable(sc);
	else
		ath_radio_enable(sc);

everything works fine again with the likely exception of rfkill.

At this stage, my understanding becomes sketchy, but I dug into
ath_radio_enable and found everything worked if i commented out

	spin_lock_bh(&sc->sc_resetlock);
	r = ath9k_hw_reset(ah, ah->curchan, false);
	if (r) {
		DPRINTF(sc, ATH_DBG_FATAL,
			"Unable to reset channel %u (%uMhz) ",
			"reset status %d\n",
			channel->center_freq, r);
	}
	spin_unlock_bh(&sc->sc_resetlock);

Again, I admit complete ignorance at such a low level, but isn't it
kind of severe to be continually resetting the hardware inside a
polling loop?

On a side note, I find that if I comment out only the ath9k_hw_reset,
but leave the spinlock/unlock, the cycling functionality of TX/RX is
no longer present, but the connection seems kind of unstable. By that
I mean I can get through one default iperf -c test, but the connection
usually drops in the second.

Kevin

^ permalink raw reply

* Re: [RFC] Infrared Keycode standardization
From: Jarod Wilson @ 2009-08-28  4:00 UTC (permalink / raw)
  To: Devin Heitmueller
  Cc: Mauro Carvalho Chehab, Ville Syrjälä,
	Linux Media Mailing List, Linux Input
In-Reply-To: <829197380908271506i251b47caoe8c08d483e78e938@mail.gmail.com>

On Thursday 27 August 2009 18:06:51 Devin Heitmueller wrote:
> On Thu, Aug 27, 2009 at 5:58 PM, Mauro Carvalho
> Chehab<mchehab@infradead.org> wrote:
> > Em Thu, 27 Aug 2009 21:36:36 +0300
> > Ville Syrjälä <syrjala@sci.fi> escreveu:
> >
> >
> >> I welcome this effort. It would be nice to have some kind of consistent
> >> behaviour between devices. But just limiting the effort to IR devices
> >> doesn't make sense. It shouldn't matter how the device is connected.
> >
> > Agreed.
> >
> >>
> >> FASTWORWARD,REWIND,FORWARD and BACK aren't very clear. To me it would
> >> make most sense if FASTFORWARD and REWIND were paired and FORWARD and
> >> BACK were paired. I actually have those two a bit confused in
> >> ati_remote2 too where I used FASTFORWARD and BACK. I suppose it should
> >> be REWIND instead.
> >
> > Makes sense. I updated it at the wiki. I also tried to group the keycodes by
> > function there.
> >
> >> Also I should probably use ZOOM for the maximize/restore button (it's
> >> FRONT now), and maybe SETUP instead of ENTER for another. It has a
> >> picture of a checkbox, Windows software apparently shows a setup menu
> >> when it's pressed.
> >>
> >> There are also a couple of buttons where no keycode really seems to
> >> match. One is the mouse button drag. I suppose I could implement the
> >> drag lock feature in the driver but I'm not sure if that's a good idea.
> >> It would make that button special and unmappable. Currently I have that
> >> mapped to EDIT IIRC.
> >
> > I'm not sure what we should do with those buttons.
> >
> > Probably, the most complete IR spec is the RC5 codes:
> >        http://c6000.spectrumdigital.com/davincievm/revf/files/msp430/rc5_codes.pdf
> > (not sure if this table is complete or accurate, but on a search I did
> > today, this is the one that gave me a better documentation)
> >
> > I suspect that, after solving the most used cases, we'll need to take a better look on it,
> > identifying the missing cases of the real implementations and add them to input.h.
> >
> >> The other oddball button has a picture of a stopwatch (I think, it's
> >> not very clear). Currently it uses COFFEE, but maybe TIMER or something
> >> like that should be added. The Windows software's manual just say it
> >> toggles TV-on-demand, but I have no idea what that actually is.
> >
> > Hmm... Maybe TV-on-demand is another name for pay-per-view?
> >
> >
> >
> > Cheers,
> > Mauro
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-media" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> Since we're on the topic of IR support, there are probably a couple of
> other things we may want to be thinking about if we plan on
> refactoring the API at all:
> 
> 1.  The fact that for RC5 remote controls, the tables in ir-keymaps.c
> only have the second byte.  In theory, they should have both bytes
> since the vendor byte helps prevents receiving spurious commands from
> unrelated remote controls.  We should include the ability to "ignore
> the vendor byte" so we can continue to support all the remotes
> currently in the ir-keymaps.c where we don't know what the vendor byte
> should contain.
> 
> 2..  The fact that the current API provides no real way to change the
> mode of operation for the IR receiver, for those receivers that
> support multiple modes (NEC/RC5/RC6).  While you have the ability to
> change the mapping table from userland via the keytable program, there
> is currently no way to tell the IR receiver which mode to operate in.
> 
> One would argue that the above keymaps structure should include new
> fields to indicate what type of remote it is (NEC/RC5/RC6 etc), as
> well as field to indicate that the vendor codes are absent from the
> key mapping for that remote).  Given this, I can change the dib0700
> and em28xx IR receivers to automatically set the IR capture mode
> appropriate based on which remote is in the device profile.

Jon Smirl actually wrote some fully functional proof-of-concept IR
handling code about a year ago, that included auto-detection and auto
decoding of several protocols. Perhaps some of that is relevant and
reusable here? (I still have a copy of the tree here somewhere...)

I've been toying with the notion of extending the input device support
that was added to the lirc_imon driver a bit ago, and add a full key
map that delivers events (we already do this for mouse functionality),
but include the ability to also use the remote and/or receiver in a
raw IR mode with lircd. Wouldn't be terribly difficult I think to do
something similar for the standard MCE remotes and receivers... Just
a simple matter of some time and some code. Unfortunately, I'm a bit
short on the time part right now...

-- 
Jarod Wilson
jarod@wilsonet.com

^ permalink raw reply

* Re: [RFC] Infrared Keycode standardization
From: Jarod Wilson @ 2009-08-28  4:00 UTC (permalink / raw)
  To: Devin Heitmueller
  Cc: Mauro Carvalho Chehab, Ville Syrjälä,
	Linux Media Mailing List, Linux Input
In-Reply-To: <829197380908271506i251b47caoe8c08d483e78e938@mail.gmail.com>

On Thursday 27 August 2009 18:06:51 Devin Heitmueller wrote:
> On Thu, Aug 27, 2009 at 5:58 PM, Mauro Carvalho
> Chehab<mchehab@infradead.org> wrote:
> > Em Thu, 27 Aug 2009 21:36:36 +0300
> > Ville Syrjälä <syrjala@sci.fi> escreveu:
> >
> >
> >> I welcome this effort. It would be nice to have some kind of consistent
> >> behaviour between devices. But just limiting the effort to IR devices
> >> doesn't make sense. It shouldn't matter how the device is connected.
> >
> > Agreed.
> >
> >>
> >> FASTWORWARD,REWIND,FORWARD and BACK aren't very clear. To me it would
> >> make most sense if FASTFORWARD and REWIND were paired and FORWARD and
> >> BACK were paired. I actually have those two a bit confused in
> >> ati_remote2 too where I used FASTFORWARD and BACK. I suppose it should
> >> be REWIND instead.
> >
> > Makes sense. I updated it at the wiki. I also tried to group the keycodes by
> > function there.
> >
> >> Also I should probably use ZOOM for the maximize/restore button (it's
> >> FRONT now), and maybe SETUP instead of ENTER for another. It has a
> >> picture of a checkbox, Windows software apparently shows a setup menu
> >> when it's pressed.
> >>
> >> There are also a couple of buttons where no keycode really seems to
> >> match. One is the mouse button drag. I suppose I could implement the
> >> drag lock feature in the driver but I'm not sure if that's a good idea.
> >> It would make that button special and unmappable. Currently I have that
> >> mapped to EDIT IIRC.
> >
> > I'm not sure what we should do with those buttons.
> >
> > Probably, the most complete IR spec is the RC5 codes:
> >        http://c6000.spectrumdigital.com/davincievm/revf/files/msp430/rc5_codes.pdf
> > (not sure if this table is complete or accurate, but on a search I did
> > today, this is the one that gave me a better documentation)
> >
> > I suspect that, after solving the most used cases, we'll need to take a better look on it,
> > identifying the missing cases of the real implementations and add them to input.h.
> >
> >> The other oddball button has a picture of a stopwatch (I think, it's
> >> not very clear). Currently it uses COFFEE, but maybe TIMER or something
> >> like that should be added. The Windows software's manual just say it
> >> toggles TV-on-demand, but I have no idea what that actually is.
> >
> > Hmm... Maybe TV-on-demand is another name for pay-per-view?
> >
> >
> >
> > Cheers,
> > Mauro
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-media" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> Since we're on the topic of IR support, there are probably a couple of
> other things we may want to be thinking about if we plan on
> refactoring the API at all:
> 
> 1.  The fact that for RC5 remote controls, the tables in ir-keymaps.c
> only have the second byte.  In theory, they should have both bytes
> since the vendor byte helps prevents receiving spurious commands from
> unrelated remote controls.  We should include the ability to "ignore
> the vendor byte" so we can continue to support all the remotes
> currently in the ir-keymaps.c where we don't know what the vendor byte
> should contain.
> 
> 2..  The fact that the current API provides no real way to change the
> mode of operation for the IR receiver, for those receivers that
> support multiple modes (NEC/RC5/RC6).  While you have the ability to
> change the mapping table from userland via the keytable program, there
> is currently no way to tell the IR receiver which mode to operate in.
> 
> One would argue that the above keymaps structure should include new
> fields to indicate what type of remote it is (NEC/RC5/RC6 etc), as
> well as field to indicate that the vendor codes are absent from the
> key mapping for that remote).  Given this, I can change the dib0700
> and em28xx IR receivers to automatically set the IR capture mode
> appropriate based on which remote is in the device profile.

Jon Smirl actually wrote some fully functional proof-of-concept IR
handling code about a year ago, that included auto-detection and auto
decoding of several protocols. Perhaps some of that is relevant and
reusable here? (I still have a copy of the tree here somewhere...)

I've been toying with the notion of extending the input device support
that was added to the lirc_imon driver a bit ago, and add a full key
map that delivers events (we already do this for mouse functionality),
but include the ability to also use the remote and/or receiver in a
raw IR mode with lircd. Wouldn't be terribly difficult I think to do
something similar for the standard MCE remotes and receivers... Just
a simple matter of some time and some code. Unfortunately, I'm a bit
short on the time part right now...

-- 
Jarod Wilson
jarod@wilsonet.com
--
To unsubscribe from this list: send the line "unsubscribe linux-input" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [ath5k-devel] [PATCH 1/2] ath5k: fix uninitialized value use in ath5k_eeprom_read_turbo_modes()
From: Luis R. Rodriguez @ 2009-08-28  3:57 UTC (permalink / raw)
  To: Nick Kossifidis
  Cc: Bob Copeland, Pavel Roskin, ath5k-devel, linux-wireless,
	John W. Linville
In-Reply-To: <40f31dec0908272001u1a67cbf3ycc5d9588dd48915d@mail.gmail.com>

On Thu, Aug 27, 2009 at 8:01 PM, Nick Kossifidis<mickflemm@gmail.com> wrote:
> 2009/8/27 Luis R. Rodriguez <mcgrof@gmail.com>:
>> On Thu, Aug 27, 2009 at 11:17 AM, Bob Copeland<bcopeland@gmail.com> wrote:
>>> On Thu, Aug 27, 2009 at 8:58 AM, Nick Kossifidis<mickflemm@gmail.com> wrote:
>>>> 2009/8/27 Pavel Roskin <proski@gnu.org>:
>>>
>>>> Current code works fine (i 've checked it against various cards),
>>>> there is nothing wrong
>>>> with having another function for reading turbo modes, i find it's
>>>> cleaner that way.
>>>
>>> Well, we also don't use the turbo modes at all and that's where the
>>> error is (IIRC) so it shouldn't have any impact. :)
>>
>> Again, why don't we just remove all that fucking turbo cruft?
>>
>>  Luis
>>
>
> Why should we remove it, we are discussing on implementing channel
> width setting for 5 and 10 MHz channels already so where is the
> problem supporting turbo mode (40MHz) ?

Supporting 5 MHz and 10 MHz channels is very different than supporting
Turbo (40 MHz). 5 MHz and 10 MHz channels seems to be something you
can use as per 802.11, 40 MHz "Turbo" stuff is just a vendor extension
and at least by my part I don't want to move a finger to either
support it nor do I think its a good idea to support it. Other people
have objected to vendor extensions before on mac80211 so I don't think
you'll find much support for this from a lot of people.

The way I see is if you want vendor extensions like Atheros Turbo or
XR use MadWifi.

> Also EEPROM code should read the eeprom and fill the structs, since
> these infos are there we should read them, i don't see any reason to
> skip them

I didn't see Bob's patch remove that stuff. Its pointless to use it though.

> i thought our goal was to support this hw as much as
> possible,

We should support users as best as possible, whether or not you
support vendor extensions is an entirely different story.

> if we want to get rid of MadWiFi we 'll have to at least
> support 5, 10 and 40MHz (turbo) channels.

I don't want to get rid of MadWifi, what we have now is a proper
upstream replacement. MadWifi is still a hack put together, and people
who want hacks can use that.

> I understand that there is
> no support yet on mac80211/cfg80211 but i don't think removing all
> this stuff and bring it back is the right thing to do.

I don't expect it will come back.

  Luis

^ permalink raw reply

* Re: (Re)assignment of PCI BARs
From: Beng Tan @ 2009-08-28  3:57 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: linux-kernel
In-Reply-To: <20090827161929.253e001d@jbarnes-g45>

Hi,

> A newer kernel could help, there have been some resource related fixes
> since 2.6.24.

Okay, I've compiled and tried the latest stable kernel (2.6.30.5)
using the instructions from
https://wiki.ubuntu.com/KernelTeam/GitKernelBuild. This kernel loads
fine, but I still have the issue.

Dumps available at

http://208.100.55.9/cold_boot_20090828-2.6.30.5-custom/dmesg.txt
http://208.100.55.9/cold_boot_20090828-2.6.30.5-custom/iomem.txt
http://208.100.55.9/cold_boot_20090828-2.6.30.5-custom/lspci_vv.txt

dmesg shows ...

[    0.214238] pci 0000:0d:00.0: reg 10 64bit mmio: [0x000000-0xfffffff]

indicating that my laptop's bios is in error and allocated BAR 0 to
the first 256M of memory. This is obviously not right. My guess is
linux should be able to fix this?

However, later on in dmesg, I get ...

[    0.292416] pnp 00:00: mem resource (0x0-0x9fbff) overlaps
0000:0d:00.0 BAR 0 (0x0-0xfffffff), disabling
[    0.292421] pnp 00:00: mem resource (0x9fc00-0x9ffff) overlaps
0000:0d:00.0 BAR 0 (0x0-0xfffffff), disabling
[    0.292425] pnp 00:00: mem resource (0xc0000-0xcffff) overlaps
0000:0d:00.0 BAR 0 (0x0-0xfffffff), disabling
[    0.292429] pnp 00:00: mem resource (0xe0000-0xfffff) overlaps
0000:0d:00.0 BAR 0 (0x0-0xfffffff), disabling
[    0.292433] pnp 00:00: mem resource (0x100000-0x7f6d33ff) overlaps
0000:0d:00.0 BAR 0 (0x0-0xfffffff), disabling

and

[    0.344334] pci 0000:0d:00.0: BAR 0: can't allocate mem resource
[0xe0000000-0xe01fffff]
[    0.344337] pci 0000:00:1c.3: PCI bridge, secondary bus 0000:0d
[    0.344341] pci 0000:00:1c.3:   IO window: 0xd000-0xdfff
[    0.344348] pci 0000:00:1c.3:   MEM window: 0xefb00000-0xefcfffff
[    0.344354] pci 0000:00:1c.3:   PREFETCH window:
0x000000e0000000-0x000000e01fffff

so linux has not reallocated BAR 0 successfully.

> In your case, it looks like the BIOS isn't giving the bus with
> your card a large enough window (256M BAR on your card vs. a 2M windows
> on the bus).

Yes, the intervening bridge only has a 2M window. Is Linux supposed to
be intelligent enough to expand the bridge's window to 256M?

Would someone more knowledgeable be able to say whether this looks like a bug?

Also, I notice from iomem that PCI allocations start from 0xd0000000
up until the end of memory. There isn't actually a free 256M chunk in
this range.

Any idea who or what decides to start allocating at 0xd0000000? Is it
possible to tell linux to start allocating at 0xc0000000 instead?
There's nothing there so the space is free.

Also, booting the kernel with pci=assign-busses didn't affect the
issue. It just changed the bus numbering around.

^ permalink raw reply

* Re: [PATCH 08/14] pktgen: reorganize transmit loop
From: Ben Greear @ 2009-08-28  3:52 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: David Miller, Robert Olsson, netdev, Thomas Gleixner
In-Reply-To: <20090827235705.740919364@vyatta.com>

+		default: /* Drivers are not supposed to return other values! */
+			if (net_ratelimit())
+				pr_info("pktgen: %s xmit error: %d\n",
+					odev->name, ret);
 			pkt_dev->errors++;

I believe this is faulty.  Things like vlans can send pkts to qdiscs
of the underlying device and those can return other values.

Patric McHardy put in some patches recently to achieve this in a more
uniform manner:

http://patchwork.ozlabs.org/patch/28340/

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply

* Re: [RFC] Infrared Keycode standardization
From: Mauro Carvalho Chehab @ 2009-08-28  3:46 UTC (permalink / raw)
  To: Devin Heitmueller
  Cc: Ville Syrjälä, Linux Media Mailing List, Linux Input
In-Reply-To: <829197380908271506i251b47caoe8c08d483e78e938@mail.gmail.com>

Em Thu, 27 Aug 2009 18:06:51 -0400
Devin Heitmueller <dheitmueller@kernellabs.com> escreveu:

> Since we're on the topic of IR support, there are probably a couple of
> other things we may want to be thinking about if we plan on
> refactoring the API at all:
> 
> 1.  The fact that for RC5 remote controls, the tables in ir-keymaps.c
> only have the second byte.  In theory, they should have both bytes
> since the vendor byte helps prevents receiving spurious commands from
> unrelated remote controls.  We should include the ability to "ignore
> the vendor byte" so we can continue to support all the remotes
> currently in the ir-keymaps.c where we don't know what the vendor byte
> should contain.

This were done due to at least two reasons:

1) Several boards uses a few GPIO bits (in general 7 or less bits) for IR.
There's one logic at ir-common.ko to convert a 32 bits GPIO read into a 7 bits
scancode.

2) In order to properly support the default EVIOCGKEYCODE/EVIOCSKEYCODE
handlers, we need to have keycode table, where the scan code is the index. So,
if we use 14 bits for it, this means that this table would reserve 16384 bytes,
and will probably a very few of those bytes (on a IR with 64 keys, it would
need only 64 entries).

As it seems that there are some ways to replace the default
getkeycode/setkeycode handlers, I suspect that we can get rid of this limitation.

I'll do some tests here with a dib0700 and an em28xx devices.

> 2..  The fact that the current API provides no real way to change the
> mode of operation for the IR receiver, for those receivers that
> support multiple modes (NEC/RC5/RC6).  While you have the ability to
> change the mapping table from userland via the keytable program, there
> is currently no way to tell the IR receiver which mode to operate in.

In this case, we'll need to have a set of new ioctls at the event interface, to
allow enum/get/set the IR protocol type(s) per event device.

> One would argue that the above keymaps structure should include new
> fields to indicate what type of remote it is (NEC/RC5/RC6 etc), as
> well as field to indicate that the vendor codes are absent from the
> key mapping for that remote).  Given this, I can change the dib0700
> and em28xx IR receivers to automatically set the IR capture mode
> appropriate based on which remote is in the device profile.

Let's go step by step. Adding the ability of dynamically change the type of
remote will likely cause major changes at the GPIO polling code, since we'll
need to move some code from bttv and saa7134 into ir-functions.c and rework on
it. We'll probably end by converting the remaining polling code to use high
precision timers as we've done with cx88.

So, we need a sort of TODO list for IR changes. A start point (on a random
order) would be something like:

1) Standardize the keycodes;
2) Implement a v4l handler for EVIOCGKEYCODE/EVIOCSKEYCODE;
3) use a different arrangement for IR tables to not spend 16 K for IR table,
yet allowing RC5 full table;
4) Use the common IR framework at the dvb drivers with their own iplementation;
5) Allow getkeycode/setkeycode to work with the dvb framework using the new
methods;
6) implement new event ioctls (EVIOEPROTO/EVIOGPROTO/EVIOSPROTO ?), to allow
enumerating/getting/setting the IR protocol types;
7) Change the non-gpio drivers to support IR protocol type;
8) Create a gpio handler that supports changing the protocol type;
9) Migrate the remaining drivers to the new gpio handler methods;
10) Merge pertinent lirc drivers;
11) Add missing keys at input.h.



Cheers,
Mauro

^ permalink raw reply

* Re: [PATCH] Fix overridable written with an extra 'e'
From: Todd Zullinger @ 2009-08-28  3:43 UTC (permalink / raw)
  To: Nanako Shiraishi; +Cc: git, Junio C Hamano
In-Reply-To: <20090828121849.6117@nanako3.lavabit.com>

[-- Attachment #1: Type: text/plain, Size: 1571 bytes --]

Nanako Shiraishi wrote:
> Found during the lunch break by one of my students...

Is overridable a word itself?  While English is my native language, I
wouldn't call myself an expert on its proper usage. ;)

However, I can't find 'overridable' in several online dictionaries:

http://dictionary.reference.com/browse/overridable
http://www.merriam-webster.com/dictionary/overridable
http://www.google.com/dictionary?aq=f&langpair=en|en&q=overridable&hl=en
http://dictionary.cambridge.org/results.asp?searchword=overridable&x=0&y=0

Perhaps using overridden would be more accurate?

>  	Prune loose objects older than date (default is 2 weeks ago,
> -	overrideable by the config variable `gc.pruneExpire`).  This
> +	overridable by the config variable `gc.pruneExpire`).  This
>  	option is on by default.

 	Prune loose objects older than date (default is 2 weeks ago,
-	overrideable by the config variable `gc.pruneExpire`).  This
+	which can be overridden by the config variable `gc.pruneExpire`).  This
 	option is on by default.

> -		warn "feature $name is not overrideable";
> +		warn "feature $name is not overridable";

and

-		warn "feature $name is not overrideable";
+		warn "feature $name cannot be overridden";

?

-- 
Todd        OpenPGP -> KeyID: 0xBEAF0CE3 | URL: www.pobox.com/~tmz/pgp
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I got stopped by a cop the other day.  He said, "Why'd you run that
stop sign?"  I said, "Because I don't believe everything I read."
    -- Stephen Wright


[-- Attachment #2: Type: application/pgp-signature, Size: 542 bytes --]

^ permalink raw reply

* ipw2200: firmware DMA loading rework
From: Zhu Yi @ 2009-08-28  3:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List,
	Bartlomiej Zolnierkiewicz, Mel Gorman, netdev@vger.kernel.org,
	linux-mm@kvack.org, James Ketrenos, Chatre, Reinette,
	linux-wireless@vger.kernel.org,
	ipw2100-devel@lists.sourceforge.net
In-Reply-To: <20090826074409.606b5124.akpm@linux-foundation.org>

Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
likely to fail and should always be avoided.

The patch fixes this problem by replacing the original order-6
pci_alloc_consistent() with an array of order-1 pages from a pci pool.
This utilized the ipw2200 DMA command blocks (up to 64 slots). The
maximum firmware size support remains the same (64*8K).

This patch fixes bug http://bugzilla.kernel.org/show_bug.cgi?id=14016

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
---
 drivers/net/wireless/ipw2x00/ipw2200.c |  120 ++++++++++++++++++--------------
 1 files changed, 67 insertions(+), 53 deletions(-)

diff --git a/drivers/net/wireless/ipw2x00/ipw2200.c b/drivers/net/wireless/ipw2x00/ipw2200.c
index 6dcac73..f593fbb 100644
--- a/drivers/net/wireless/ipw2x00/ipw2200.c
+++ b/drivers/net/wireless/ipw2x00/ipw2200.c
@@ -2874,45 +2874,27 @@ static int ipw_fw_dma_add_command_block(struct ipw_priv *priv,
 	return 0;
 }
 
-static int ipw_fw_dma_add_buffer(struct ipw_priv *priv,
-				 u32 src_phys, u32 dest_address, u32 length)
+static int ipw_fw_dma_add_buffer(struct ipw_priv *priv, dma_addr_t *src_address,
+				 int nr, u32 dest_address, u32 len)
 {
-	u32 bytes_left = length;
-	u32 src_offset = 0;
-	u32 dest_offset = 0;
-	int status = 0;
+	int ret, i;
+	u32 size;
+
 	IPW_DEBUG_FW(">> \n");
-	IPW_DEBUG_FW_INFO("src_phys=0x%x dest_address=0x%x length=0x%x\n",
-			  src_phys, dest_address, length);
-	while (bytes_left > CB_MAX_LENGTH) {
-		status = ipw_fw_dma_add_command_block(priv,
-						      src_phys + src_offset,
-						      dest_address +
-						      dest_offset,
-						      CB_MAX_LENGTH, 0, 0);
-		if (status) {
+	IPW_DEBUG_FW_INFO("nr=%d dest_address=0x%x len=0x%x\n",
+			  nr, dest_address, len);
+
+	for (i = 0; i < nr; i++) {
+		size = min_t(u32, len - i * CB_MAX_LENGTH, CB_MAX_LENGTH);
+		ret = ipw_fw_dma_add_command_block(priv, src_address[i],
+						   dest_address +
+						   i * CB_MAX_LENGTH, size,
+						   0, 0);
+		if (ret) {
 			IPW_DEBUG_FW_INFO(": Failed\n");
 			return -1;
 		} else
 			IPW_DEBUG_FW_INFO(": Added new cb\n");
-
-		src_offset += CB_MAX_LENGTH;
-		dest_offset += CB_MAX_LENGTH;
-		bytes_left -= CB_MAX_LENGTH;
-	}
-
-	/* add the buffer tail */
-	if (bytes_left > 0) {
-		status =
-		    ipw_fw_dma_add_command_block(priv, src_phys + src_offset,
-						 dest_address + dest_offset,
-						 bytes_left, 0, 0);
-		if (status) {
-			IPW_DEBUG_FW_INFO(": Failed on the buffer tail\n");
-			return -1;
-		} else
-			IPW_DEBUG_FW_INFO
-			    (": Adding new cb - the buffer tail\n");
 	}
 
 	IPW_DEBUG_FW("<< \n");
@@ -3160,59 +3142,91 @@ static int ipw_load_ucode(struct ipw_priv *priv, u8 * data, size_t len)
 
 static int ipw_load_firmware(struct ipw_priv *priv, u8 * data, size_t len)
 {
-	int rc = -1;
+	int ret = -1;
 	int offset = 0;
 	struct fw_chunk *chunk;
-	dma_addr_t shared_phys;
-	u8 *shared_virt;
+	int total_nr = 0;
+	int i;
+	struct pci_pool *pool;
+	u32 *virts[CB_NUMBER_OF_ELEMENTS_SMALL];
+	dma_addr_t phys[CB_NUMBER_OF_ELEMENTS_SMALL];
 
 	IPW_DEBUG_TRACE("<< : \n");
-	shared_virt = pci_alloc_consistent(priv->pci_dev, len, &shared_phys);
 
-	if (!shared_virt)
+	pool = pci_pool_create("ipw2200", priv->pci_dev, CB_MAX_LENGTH, 0, 0);
+	if (!pool) {
+		IPW_ERROR("pci_pool_create failed\n");
 		return -ENOMEM;
-
-	memmove(shared_virt, data, len);
+	}
 
 	/* Start the Dma */
-	rc = ipw_fw_dma_enable(priv);
+	ret = ipw_fw_dma_enable(priv);
 
 	/* the DMA is already ready this would be a bug. */
 	BUG_ON(priv->sram_desc.last_cb_index > 0);
 
 	do {
+		u32 chunk_len;
+		u8 *start;
+		int size;
+		int nr = 0;
+
 		chunk = (struct fw_chunk *)(data + offset);
 		offset += sizeof(struct fw_chunk);
+		chunk_len = le32_to_cpu(chunk->length);
+		start = data + offset;
+
+		nr = (chunk_len + CB_MAX_LENGTH - 1) / CB_MAX_LENGTH;
+		for (i = 0; i < nr; i++) {
+			virts[total_nr] = pci_pool_alloc(pool, GFP_KERNEL,
+							 &phys[total_nr]);
+			if (!virts[total_nr]) {
+				ret = -ENOMEM;
+				goto out;
+			}
+			size = min_t(u32, chunk_len - i * CB_MAX_LENGTH,
+				     CB_MAX_LENGTH);
+			memcpy(virts[total_nr], start, size);
+			start += size;
+			total_nr++;
+			/* We don't support fw chunk larger than 64*8K */
+			BUG_ON(total_nr > CB_NUMBER_OF_ELEMENTS_SMALL);
+		}
+
 		/* build DMA packet and queue up for sending */
 		/* dma to chunk->address, the chunk->length bytes from data +
 		 * offeset*/
 		/* Dma loading */
-		rc = ipw_fw_dma_add_buffer(priv, shared_phys + offset,
-					   le32_to_cpu(chunk->address),
-					   le32_to_cpu(chunk->length));
-		if (rc) {
+		ret = ipw_fw_dma_add_buffer(priv, &phys[total_nr - nr],
+					    nr, le32_to_cpu(chunk->address),
+					    chunk_len);
+		if (ret) {
 			IPW_DEBUG_INFO("dmaAddBuffer Failed\n");
 			goto out;
 		}
 
-		offset += le32_to_cpu(chunk->length);
+		offset += chunk_len;
 	} while (offset < len);
 
 	/* Run the DMA and wait for the answer */
-	rc = ipw_fw_dma_kick(priv);
-	if (rc) {
+	ret = ipw_fw_dma_kick(priv);
+	if (ret) {
 		IPW_ERROR("dmaKick Failed\n");
 		goto out;
 	}
 
-	rc = ipw_fw_dma_wait(priv);
-	if (rc) {
+	ret = ipw_fw_dma_wait(priv);
+	if (ret) {
 		IPW_ERROR("dmaWaitSync Failed\n");
 		goto out;
 	}
-      out:
-	pci_free_consistent(priv->pci_dev, len, shared_virt, shared_phys);
-	return rc;
+ out:
+	for (i = 0; i < total_nr; i++)
+		pci_pool_free(pool, virts[i], phys[i]);
+
+	pci_pool_destroy(pool);
+
+	return ret;
 }
 
 /* stop nic */
-- 
1.5.3.6




^ permalink raw reply related

* ipw2200: firmware DMA loading rework
From: Zhu Yi @ 2009-08-28  3:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List,
	Bartlomiej Zolnierkiewicz, Mel Gorman, netdev@vger.kernel.org,
	linux-mm@kvack.org, James Ketrenos, Chatre, Reinette,
	linux-wireless@vger.kernel.org,
	ipw2100-devel@lists.sourceforge.net
In-Reply-To: <20090826074409.606b5124.akpm@linux-foundation.org>

Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
likely to fail and should always be avoided.

The patch fixes this problem by replacing the original order-6
pci_alloc_consistent() with an array of order-1 pages from a pci pool.
This utilized the ipw2200 DMA command blocks (up to 64 slots). The
maximum firmware size support remains the same (64*8K).

This patch fixes bug http://bugzilla.kernel.org/show_bug.cgi?id=14016

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
---
 drivers/net/wireless/ipw2x00/ipw2200.c |  120 ++++++++++++++++++--------------
 1 files changed, 67 insertions(+), 53 deletions(-)

diff --git a/drivers/net/wireless/ipw2x00/ipw2200.c b/drivers/net/wireless/ipw2x00/ipw2200.c
index 6dcac73..f593fbb 100644
--- a/drivers/net/wireless/ipw2x00/ipw2200.c
+++ b/drivers/net/wireless/ipw2x00/ipw2200.c
@@ -2874,45 +2874,27 @@ static int ipw_fw_dma_add_command_block(struct ipw_priv *priv,
 	return 0;
 }
 
-static int ipw_fw_dma_add_buffer(struct ipw_priv *priv,
-				 u32 src_phys, u32 dest_address, u32 length)
+static int ipw_fw_dma_add_buffer(struct ipw_priv *priv, dma_addr_t *src_address,
+				 int nr, u32 dest_address, u32 len)
 {
-	u32 bytes_left = length;
-	u32 src_offset = 0;
-	u32 dest_offset = 0;
-	int status = 0;
+	int ret, i;
+	u32 size;
+
 	IPW_DEBUG_FW(">> \n");
-	IPW_DEBUG_FW_INFO("src_phys=0x%x dest_address=0x%x length=0x%x\n",
-			  src_phys, dest_address, length);
-	while (bytes_left > CB_MAX_LENGTH) {
-		status = ipw_fw_dma_add_command_block(priv,
-						      src_phys + src_offset,
-						      dest_address +
-						      dest_offset,
-						      CB_MAX_LENGTH, 0, 0);
-		if (status) {
+	IPW_DEBUG_FW_INFO("nr=%d dest_address=0x%x len=0x%x\n",
+			  nr, dest_address, len);
+
+	for (i = 0; i < nr; i++) {
+		size = min_t(u32, len - i * CB_MAX_LENGTH, CB_MAX_LENGTH);
+		ret = ipw_fw_dma_add_command_block(priv, src_address[i],
+						   dest_address +
+						   i * CB_MAX_LENGTH, size,
+						   0, 0);
+		if (ret) {
 			IPW_DEBUG_FW_INFO(": Failed\n");
 			return -1;
 		} else
 			IPW_DEBUG_FW_INFO(": Added new cb\n");
-
-		src_offset += CB_MAX_LENGTH;
-		dest_offset += CB_MAX_LENGTH;
-		bytes_left -= CB_MAX_LENGTH;
-	}
-
-	/* add the buffer tail */
-	if (bytes_left > 0) {
-		status =
-		    ipw_fw_dma_add_command_block(priv, src_phys + src_offset,
-						 dest_address + dest_offset,
-						 bytes_left, 0, 0);
-		if (status) {
-			IPW_DEBUG_FW_INFO(": Failed on the buffer tail\n");
-			return -1;
-		} else
-			IPW_DEBUG_FW_INFO
-			    (": Adding new cb - the buffer tail\n");
 	}
 
 	IPW_DEBUG_FW("<< \n");
@@ -3160,59 +3142,91 @@ static int ipw_load_ucode(struct ipw_priv *priv, u8 * data, size_t len)
 
 static int ipw_load_firmware(struct ipw_priv *priv, u8 * data, size_t len)
 {
-	int rc = -1;
+	int ret = -1;
 	int offset = 0;
 	struct fw_chunk *chunk;
-	dma_addr_t shared_phys;
-	u8 *shared_virt;
+	int total_nr = 0;
+	int i;
+	struct pci_pool *pool;
+	u32 *virts[CB_NUMBER_OF_ELEMENTS_SMALL];
+	dma_addr_t phys[CB_NUMBER_OF_ELEMENTS_SMALL];
 
 	IPW_DEBUG_TRACE("<< : \n");
-	shared_virt = pci_alloc_consistent(priv->pci_dev, len, &shared_phys);
 
-	if (!shared_virt)
+	pool = pci_pool_create("ipw2200", priv->pci_dev, CB_MAX_LENGTH, 0, 0);
+	if (!pool) {
+		IPW_ERROR("pci_pool_create failed\n");
 		return -ENOMEM;
-
-	memmove(shared_virt, data, len);
+	}
 
 	/* Start the Dma */
-	rc = ipw_fw_dma_enable(priv);
+	ret = ipw_fw_dma_enable(priv);
 
 	/* the DMA is already ready this would be a bug. */
 	BUG_ON(priv->sram_desc.last_cb_index > 0);
 
 	do {
+		u32 chunk_len;
+		u8 *start;
+		int size;
+		int nr = 0;
+
 		chunk = (struct fw_chunk *)(data + offset);
 		offset += sizeof(struct fw_chunk);
+		chunk_len = le32_to_cpu(chunk->length);
+		start = data + offset;
+
+		nr = (chunk_len + CB_MAX_LENGTH - 1) / CB_MAX_LENGTH;
+		for (i = 0; i < nr; i++) {
+			virts[total_nr] = pci_pool_alloc(pool, GFP_KERNEL,
+							 &phys[total_nr]);
+			if (!virts[total_nr]) {
+				ret = -ENOMEM;
+				goto out;
+			}
+			size = min_t(u32, chunk_len - i * CB_MAX_LENGTH,
+				     CB_MAX_LENGTH);
+			memcpy(virts[total_nr], start, size);
+			start += size;
+			total_nr++;
+			/* We don't support fw chunk larger than 64*8K */
+			BUG_ON(total_nr > CB_NUMBER_OF_ELEMENTS_SMALL);
+		}
+
 		/* build DMA packet and queue up for sending */
 		/* dma to chunk->address, the chunk->length bytes from data +
 		 * offeset*/
 		/* Dma loading */
-		rc = ipw_fw_dma_add_buffer(priv, shared_phys + offset,
-					   le32_to_cpu(chunk->address),
-					   le32_to_cpu(chunk->length));
-		if (rc) {
+		ret = ipw_fw_dma_add_buffer(priv, &phys[total_nr - nr],
+					    nr, le32_to_cpu(chunk->address),
+					    chunk_len);
+		if (ret) {
 			IPW_DEBUG_INFO("dmaAddBuffer Failed\n");
 			goto out;
 		}
 
-		offset += le32_to_cpu(chunk->length);
+		offset += chunk_len;
 	} while (offset < len);
 
 	/* Run the DMA and wait for the answer */
-	rc = ipw_fw_dma_kick(priv);
-	if (rc) {
+	ret = ipw_fw_dma_kick(priv);
+	if (ret) {
 		IPW_ERROR("dmaKick Failed\n");
 		goto out;
 	}
 
-	rc = ipw_fw_dma_wait(priv);
-	if (rc) {
+	ret = ipw_fw_dma_wait(priv);
+	if (ret) {
 		IPW_ERROR("dmaWaitSync Failed\n");
 		goto out;
 	}
-      out:
-	pci_free_consistent(priv->pci_dev, len, shared_virt, shared_phys);
-	return rc;
+ out:
+	for (i = 0; i < total_nr; i++)
+		pci_pool_free(pool, virts[i], phys[i]);
+
+	pci_pool_destroy(pool);
+
+	return ret;
 }
 
 /* stop nic */
-- 
1.5.3.6



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related

* ipw2200: firmware DMA loading rework
From: Zhu Yi @ 2009-08-28  3:42 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Mel Gorman, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
	Linux Kernel Mailing List, Kernel Testers List,
	Bartlomiej Zolnierkiewicz, Mel Gorman,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org, James Ketrenos,
	Chatre, Reinette,
	linux-wireless-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	ipw2100-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
In-Reply-To: <20090826074409.606b5124.akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>

Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
likely to fail and should always be avoided.

The patch fixes this problem by replacing the original order-6
pci_alloc_consistent() with an array of order-1 pages from a pci pool.
This utilized the ipw2200 DMA command blocks (up to 64 slots). The
maximum firmware size support remains the same (64*8K).

This patch fixes bug http://bugzilla.kernel.org/show_bug.cgi?id=14016

Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>
Signed-off-by: Zhu Yi <yi.zhu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
 drivers/net/wireless/ipw2x00/ipw2200.c |  120 ++++++++++++++++++--------------
 1 files changed, 67 insertions(+), 53 deletions(-)

diff --git a/drivers/net/wireless/ipw2x00/ipw2200.c b/drivers/net/wireless/ipw2x00/ipw2200.c
index 6dcac73..f593fbb 100644
--- a/drivers/net/wireless/ipw2x00/ipw2200.c
+++ b/drivers/net/wireless/ipw2x00/ipw2200.c
@@ -2874,45 +2874,27 @@ static int ipw_fw_dma_add_command_block(struct ipw_priv *priv,
 	return 0;
 }
 
-static int ipw_fw_dma_add_buffer(struct ipw_priv *priv,
-				 u32 src_phys, u32 dest_address, u32 length)
+static int ipw_fw_dma_add_buffer(struct ipw_priv *priv, dma_addr_t *src_address,
+				 int nr, u32 dest_address, u32 len)
 {
-	u32 bytes_left = length;
-	u32 src_offset = 0;
-	u32 dest_offset = 0;
-	int status = 0;
+	int ret, i;
+	u32 size;
+
 	IPW_DEBUG_FW(">> \n");
-	IPW_DEBUG_FW_INFO("src_phys=0x%x dest_address=0x%x length=0x%x\n",
-			  src_phys, dest_address, length);
-	while (bytes_left > CB_MAX_LENGTH) {
-		status = ipw_fw_dma_add_command_block(priv,
-						      src_phys + src_offset,
-						      dest_address +
-						      dest_offset,
-						      CB_MAX_LENGTH, 0, 0);
-		if (status) {
+	IPW_DEBUG_FW_INFO("nr=%d dest_address=0x%x len=0x%x\n",
+			  nr, dest_address, len);
+
+	for (i = 0; i < nr; i++) {
+		size = min_t(u32, len - i * CB_MAX_LENGTH, CB_MAX_LENGTH);
+		ret = ipw_fw_dma_add_command_block(priv, src_address[i],
+						   dest_address +
+						   i * CB_MAX_LENGTH, size,
+						   0, 0);
+		if (ret) {
 			IPW_DEBUG_FW_INFO(": Failed\n");
 			return -1;
 		} else
 			IPW_DEBUG_FW_INFO(": Added new cb\n");
-
-		src_offset += CB_MAX_LENGTH;
-		dest_offset += CB_MAX_LENGTH;
-		bytes_left -= CB_MAX_LENGTH;
-	}
-
-	/* add the buffer tail */
-	if (bytes_left > 0) {
-		status =
-		    ipw_fw_dma_add_command_block(priv, src_phys + src_offset,
-						 dest_address + dest_offset,
-						 bytes_left, 0, 0);
-		if (status) {
-			IPW_DEBUG_FW_INFO(": Failed on the buffer tail\n");
-			return -1;
-		} else
-			IPW_DEBUG_FW_INFO
-			    (": Adding new cb - the buffer tail\n");
 	}
 
 	IPW_DEBUG_FW("<< \n");
@@ -3160,59 +3142,91 @@ static int ipw_load_ucode(struct ipw_priv *priv, u8 * data, size_t len)
 
 static int ipw_load_firmware(struct ipw_priv *priv, u8 * data, size_t len)
 {
-	int rc = -1;
+	int ret = -1;
 	int offset = 0;
 	struct fw_chunk *chunk;
-	dma_addr_t shared_phys;
-	u8 *shared_virt;
+	int total_nr = 0;
+	int i;
+	struct pci_pool *pool;
+	u32 *virts[CB_NUMBER_OF_ELEMENTS_SMALL];
+	dma_addr_t phys[CB_NUMBER_OF_ELEMENTS_SMALL];
 
 	IPW_DEBUG_TRACE("<< : \n");
-	shared_virt = pci_alloc_consistent(priv->pci_dev, len, &shared_phys);
 
-	if (!shared_virt)
+	pool = pci_pool_create("ipw2200", priv->pci_dev, CB_MAX_LENGTH, 0, 0);
+	if (!pool) {
+		IPW_ERROR("pci_pool_create failed\n");
 		return -ENOMEM;
-
-	memmove(shared_virt, data, len);
+	}
 
 	/* Start the Dma */
-	rc = ipw_fw_dma_enable(priv);
+	ret = ipw_fw_dma_enable(priv);
 
 	/* the DMA is already ready this would be a bug. */
 	BUG_ON(priv->sram_desc.last_cb_index > 0);
 
 	do {
+		u32 chunk_len;
+		u8 *start;
+		int size;
+		int nr = 0;
+
 		chunk = (struct fw_chunk *)(data + offset);
 		offset += sizeof(struct fw_chunk);
+		chunk_len = le32_to_cpu(chunk->length);
+		start = data + offset;
+
+		nr = (chunk_len + CB_MAX_LENGTH - 1) / CB_MAX_LENGTH;
+		for (i = 0; i < nr; i++) {
+			virts[total_nr] = pci_pool_alloc(pool, GFP_KERNEL,
+							 &phys[total_nr]);
+			if (!virts[total_nr]) {
+				ret = -ENOMEM;
+				goto out;
+			}
+			size = min_t(u32, chunk_len - i * CB_MAX_LENGTH,
+				     CB_MAX_LENGTH);
+			memcpy(virts[total_nr], start, size);
+			start += size;
+			total_nr++;
+			/* We don't support fw chunk larger than 64*8K */
+			BUG_ON(total_nr > CB_NUMBER_OF_ELEMENTS_SMALL);
+		}
+
 		/* build DMA packet and queue up for sending */
 		/* dma to chunk->address, the chunk->length bytes from data +
 		 * offeset*/
 		/* Dma loading */
-		rc = ipw_fw_dma_add_buffer(priv, shared_phys + offset,
-					   le32_to_cpu(chunk->address),
-					   le32_to_cpu(chunk->length));
-		if (rc) {
+		ret = ipw_fw_dma_add_buffer(priv, &phys[total_nr - nr],
+					    nr, le32_to_cpu(chunk->address),
+					    chunk_len);
+		if (ret) {
 			IPW_DEBUG_INFO("dmaAddBuffer Failed\n");
 			goto out;
 		}
 
-		offset += le32_to_cpu(chunk->length);
+		offset += chunk_len;
 	} while (offset < len);
 
 	/* Run the DMA and wait for the answer */
-	rc = ipw_fw_dma_kick(priv);
-	if (rc) {
+	ret = ipw_fw_dma_kick(priv);
+	if (ret) {
 		IPW_ERROR("dmaKick Failed\n");
 		goto out;
 	}
 
-	rc = ipw_fw_dma_wait(priv);
-	if (rc) {
+	ret = ipw_fw_dma_wait(priv);
+	if (ret) {
 		IPW_ERROR("dmaWaitSync Failed\n");
 		goto out;
 	}
-      out:
-	pci_free_consistent(priv->pci_dev, len, shared_virt, shared_phys);
-	return rc;
+ out:
+	for (i = 0; i < total_nr; i++)
+		pci_pool_free(pool, virts[i], phys[i]);
+
+	pci_pool_destroy(pool);
+
+	return ret;
 }
 
 /* stop nic */
-- 
1.5.3.6



^ permalink raw reply related

* Re: [PATCHv4 08/12] Teach the notes lookup code to parse notes trees with various fanout schemes
From: Junio C Hamano @ 2009-08-28  3:35 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: Jeff King, Johan Herland, git, Shawn O. Pearce,
	Johannes.Schindelin, trast, tavestbo, git, chriscool
In-Reply-To: <fabb9a1e0908272005i4bf9b906xba08a711d384dd83@mail.gmail.com>

Sverre Rabbelier <srabbelier@gmail.com> writes:

> On Thu, Aug 27, 2009 at 20:02, Junio C Hamano<gitster@pobox.com> wrote:
>> In such a case would you rather want to see the commit itself first, or at
>> least, commit _and_ notes _together_?
>
> Assuming you do download all notes, I think it would be nice to be
> able to read the note; and since there's no way to download the commit
> separately it would require one to guess which head the commit belongs
> to and fetch the entire branch...?

Some random thoughts...

 * If there are very many branches (in the worst case, they are so many
   that the upstream uses the expand extention to serve the project),
   maybe the notes namespace will also have many branches.  It is unclear
   how a user is expected to know which notes branch a note to a
   particular commit is to be found.

 * Perhaps to solve that problem, such a project may use notes in the
   corresponding "notes branch"?  Then your assumption does not hold, as
   you first guess which notes branch to fetch to find the note that may
   not even exist for this issue to become a real problem.

 * If you assume all the notes are downloaded in such a project, you can
   still go through all the top-level trees (that are date based fan-out)
   and find the note to the commit object you do not have.  At that point,
   it only becomes performance issue for an unusual case where you have
   a note but the commit the note applies to.

^ permalink raw reply

* Discussion request for new Samsung SoCs maintaining
From: Harald Welte @ 2009-08-28  3:34 UTC (permalink / raw)
  To: linux-arm-kernel
In-Reply-To: <200908271357.n7RDv4U9028131@chronolytics.com>

Hi David,

On Thu, Aug 27, 2009 at 09:57:04AM -0400, David F. Carlson wrote:
 
> I have been working with the 6410 tree in several places.  A merge of samsung
> tree would be greatly appreciated.

Thanks.  Please note that this work is just starting, and it is a learning
curve for most people inside the samsung soc linux team as they have not
participated in the mainline development process so far.  So it will probably
a slow but steady start.

Though the decision has been made, and they will work on this from now on
throughout the coming months.

> Please remember that since I am working SmartQ and there are 3 other 6410
> based MACHs in Ben's next-s3c tree.  
> 
> So, please, MACH_SMDK6410 != CPU_6410 and
>             MACH_SMDK6410 != PLAT_64XX.

Yes, this is a common accident/mistake in the current code.  We'll try to sort
them out before submitting patches for review.  But even if they persist, I'm
sure you or others will be happy to point those out at that time.

> The common peripheral support needs to be factored out so that "thin" MACH
> configs can set some per-MACH GPIOs/chipsets (LCD, power,etc.)

That sounds definitely useful, but it's probably another topic on its own.
Right now the aim is to get the code cleaned up and submitted.  Samsung is
only working/testing with the SMDK's.  Once more and more machines will get
mainline, there can probably be more code sharing among them.

-- 
- Harald Welte <laforge@gnumonks.org>           http://laforge.gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)

^ permalink raw reply

* Re: RFC: THE OFFLINE SCHEDULER
From: Gregory Haskins @ 2009-08-28  3:33 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Thomas Gleixner, Christoph Lameter, Chris Friesen, raz ben yehuda,
	Andrew Morton, mingo, peterz, maximlevitsky, efault, wiseman,
	linux-kernel, linux-rt-users
In-Reply-To: <4A973DAE.4020508@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 1293 bytes --]

Hi Rik,

Rik van Riel wrote:
> Gregory Haskins wrote:
> 
>> 2) Modify FIFO so that it disables tick by default...update accounting
>> info at next reschedule event.
> 
> I like it.  The only thing to watch out for is that
> events that wake up higher-priority FIFO tasks do
> not get deferred :)
> 

Yeah, agreed.  My (potentially half-baked) proposal should work at least
from a pure scheduling perspective since FIFO technically does not
reschedule based on a tick, and wakeups/migrations should still work
bidirectionally with existing scheduler policies.

However, and to what I believe is your point: its not entirely clear to
me what impact, if any, there would be w.r.t. any _other_ events that
may be driven off of the scheduler tick (i.e. events other than
scheduling policies, like timeslice expiration, etc).  Perhaps someone
else like Thomas, Ingo, or Peter have some input here.

I guess the specific question to ask is: Does the scheduler tick code
have any role other than timeslice policies and updating accounting
information?  Examples would include timer-expiry, for instance.  I
would think most of this logic is handled by finer grained components
like HRT, but I am admittedly ignorant of the actual timer voodoo ;)

Kind Regards,
-Greg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 267 bytes --]

^ permalink raw reply

* Re: RFC: THE OFFLINE SCHEDULER
From: Gregory Haskins @ 2009-08-28  3:33 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Thomas Gleixner, Christoph Lameter, Chris Friesen, raz ben yehuda,
	Andrew Morton, mingo, peterz, maximlevitsky, efault, wiseman,
	linux-kernel, linux-rt-users
In-Reply-To: <4A973DAE.4020508@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 1293 bytes --]

Hi Rik,

Rik van Riel wrote:
> Gregory Haskins wrote:
> 
>> 2) Modify FIFO so that it disables tick by default...update accounting
>> info at next reschedule event.
> 
> I like it.  The only thing to watch out for is that
> events that wake up higher-priority FIFO tasks do
> not get deferred :)
> 

Yeah, agreed.  My (potentially half-baked) proposal should work at least
from a pure scheduling perspective since FIFO technically does not
reschedule based on a tick, and wakeups/migrations should still work
bidirectionally with existing scheduler policies.

However, and to what I believe is your point: its not entirely clear to
me what impact, if any, there would be w.r.t. any _other_ events that
may be driven off of the scheduler tick (i.e. events other than
scheduling policies, like timeslice expiration, etc).  Perhaps someone
else like Thomas, Ingo, or Peter have some input here.

I guess the specific question to ask is: Does the scheduler tick code
have any role other than timeslice policies and updating accounting
information?  Examples would include timer-expiry, for instance.  I
would think most of this logic is handled by finer grained components
like HRT, but I am admittedly ignorant of the actual timer voodoo ;)

Kind Regards,
-Greg


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 267 bytes --]

^ permalink raw reply

* Re: [PATCH] Make radix_tree_preload alloc one more slot
From: Wu Fengguang @ 2009-08-28  3:31 UTC (permalink / raw)
  To: Yan, Zheng 
  Cc: Zhu Yanhai, Andrew Morton, Jiri Kosina, Huang Shijie,
	linux-kernel@vger.kernel.org, Zhu Yanhai, Nick Piggin
In-Reply-To: <3d0408630908271933v1d136b85he33e7d4b6e0bccf6@mail.gmail.com>

On Fri, Aug 28, 2009 at 10:33:03AM +0800, Yan, Zheng  wrote:
> 2009/8/28 Wu Fengguang <fengguang.wu@intel.com>:
> > On Fri, Aug 28, 2009 at 12:46:46AM +0800, Yan, Zheng  wrote:
> >> 2009/8/27 Wu Fengguang <fengguang.wu@intel.com>:
> >> >
> >> > Hi Yanhai,
> >> >
> >> > [Nick CCed]
> >> >
> >> > On Thu, Aug 27, 2009 at 08:10:41PM +0800, Zhu Yanhai wrote:
> >> >> The operations against radix tree always use paths with RADIX_TREE_MAX_PATH
> >> >> + 1 slots, but radix_tree_preload only pre-allocs RADIX_TREE_MAX_PATH
> >> >> slots at present, which causes radix_tree_node_alloc tries to do
> >> >> kmem_cache_alloc at the last slot even if we don't have gfp_mask &
> >> >> __GFP_WAIT in hand.
> >> >
> >> > Are you sure?  The comments read:
> >> >
> >> >        /*
> >> >         * The radix tree path needs to be one longer than the maximum path
> >> >         * since the "list" is null terminated.
> >> >         */
> >> >        struct radix_tree_path path[RADIX_TREE_MAX_PATH + 1], *pathp = path;
> >> >
> >> > Thanks,
> >> > Fengguang
> >> >
> >> >
> >> >> Signed-off-by: Zhu Yanhai <zhu.yanhai@gmail.com>
> >> >>
> >> >> ---
> >> >>  lib/radix-tree.c |    2 +-
> >> >>  1 files changed, 1 insertions(+), 1 deletions(-)
> >> >>
> >> >> diff --git a/lib/radix-tree.c b/lib/radix-tree.c
> >> >> index 23abbd9..72225a8 100644
> >> >> --- a/lib/radix-tree.c
> >> >> +++ b/lib/radix-tree.c
> >> >> @@ -79,7 +79,7 @@ static struct kmem_cache *radix_tree_node_cachep;
> >> >>   */
> >> >>  struct radix_tree_preload {
> >> >>       int nr;
> >> >> -     struct radix_tree_node *nodes[RADIX_TREE_MAX_PATH];
> >> >> +     struct radix_tree_node *nodes[RADIX_TREE_MAX_PATH + 1];
> >> >>  };
> >> >>  static DEFINE_PER_CPU(struct radix_tree_preload, radix_tree_preloads) = { 0, };
> >> >>
> >> >> --
> >> >> 1.6.2.2
> >>
> >> here is test case.
> >> ---
> >> #include <linux/module.h>
> >> #include <linux/kernel.h>
> >> #include <linux/radix-tree.h>
> >>
> >> static void __exit exit_test(void)
> >> {
> >> }
> >>
> >> static int __init init_test(void)
> >> {
> >>         struct radix_tree_root radix_tree;
> >>         int foo;
> >>
> >>         INIT_RADIX_TREE(&radix_tree, GFP_KERNEL);
> >>         radix_tree_preload(GFP_KERNEL);
> >>         preempt_disable();
> >>         radix_tree_insert(&radix_tree, (unsigned long)-2, &foo);
> >>         preempt_enable();
> >>         radix_tree_preload_end();
> >>         radix_tree_delete(&radix_tree, (unsigned long)-2);
> >>         return -1;
> >> }
> >>
> >> module_init(init_test)
> >> module_exit(exit_test)
> >> MODULE_LICENSE("GPL");
> >> -- end --
> >>
> >> I got following oops.
> >> ---
> >> BUG: sleeping function called from invalid context at
> >> /home/zhyan/linux-2.6/mm/slub.c:1697
> >> in_atomic(): 1, irqs_disabled(): 0, pid: 2791, name: insmod
> >> Pid: 2791, comm: insmod Not tainted 2.6.31-rc7 #1
> >> Call Trace:
> >>  [<c042b26a>] __might_sleep+0x101/0x108
> >>  [<c04b3976>] kmem_cache_alloc+0x39/0x141
> >>  [<c05453c7>] ? radix_tree_node_alloc+0x4c/0x5d
> >>  [<c05453c7>] radix_tree_node_alloc+0x4c/0x5d
> >>  [<c054549a>] radix_tree_insert+0xc2/0x174
> >>  [<f90cd03f>] init_test+0x3f/0x89 [test]
> >>  [<c0401141>] do_one_initcall+0x4f/0x11f
> >>  [<f90cd000>] ? init_test+0x0/0x89 [test]
> >>  [<c044e504>] ? __blocking_notifier_call_chain+0x45/0x51
> >>  [<c045d833>] sys_init_module+0xac/0x1bc
> >>  [<c0403458>] sysenter_do_call+0x12/0x2d
> >> -- end --
> > This is weird, I cannot reproduce this message. Here is my .config.
> > I use SLAB but it also has the might_sleep_if(__GFP_WAIT) debug call.
> > Thanks,
> > Fengguang
> 
> I realize it's my fault, I use GFP_KERNEL to initialize the radix tree.
> radix_tree_node_alloc does not use pre-loaded memory when the
> radix tree is initialized with GFP_KERNEL. Yesterday when I saw the

Ah yes!

> Oops, I suspect it's an radix_tree_preload bug. I told Zhu Yanhai
> my thought. Please forgive my rash act.

That's all right. So we can just do nothing? :)

Thanks,
Fengguang


^ permalink raw reply

* RE: nVidia Geforce 8400 GS PCI Express x16 VGA Pass Through to Windows XP Home 32-bit HVM Virtual Machine with Intel Desktop Board DQ45CB
From: Han, Weidong @ 2009-08-28  3:30 UTC (permalink / raw)
  To: 'enming.teo@asiasoftsea.net',
	'djmagee@mageenet.net'
  Cc: 'xen-devel@lists.xensource.com'
In-Reply-To: <B17DADB39C0E46F79712BC4E94F8271D@ASOITIS16>

It's easy to disable emulated VGA, you just add a check in hw/pc.c:
        +    if (gfx_passthru == GFX_NO_PASSTHRU) {
        +       if (cirrus_vga_enabled) {
        ......

That means: if gfx passthroughed, disable emulated VGA.

For another question about loading VGA bios, given you have saved the VGA BIOS of your card to vgabios-pt.bin, you should first put it at xen-unstable.hg/tools/firmware/vgabios/, and also you needs following changes, at last, you still need to pass BDF of gfx to rombios per firmware spec.

diff -r 494be76c1ad9 tools/firmware/hvmloader/Makefile
--- a/tools/firmware/hvmloader/Makefile Thu Aug 27 16:54:33 2009 +0800
+++ b/tools/firmware/hvmloader/Makefile Thu Aug 27 17:22:01 2009 +0800
@@ -50,6 +50,7 @@ roms.h: ../rombios/BIOS-bochs-latest ../
 roms.h: ../rombios/BIOS-bochs-latest ../vgabios/VGABIOS-lgpl-latest.bin \
    ../vgabios/VGABIOS-lgpl-latest.cirrus.bin ../etherboot/eb-roms.h
    sh ./mkhex rombios ../rombios/BIOS-bochs-latest > roms.h
+   sh ./mkhex vgabios_pt ../vgabios/vgabios-pt.bin >> roms.h
    sh ./mkhex vgabios_stdvga ../vgabios/VGABIOS-lgpl-latest.bin >> roms.h
    sh ./mkhex vgabios_cirrusvga \
        ../vgabios/VGABIOS-lgpl-latest.cirrus.bin >> roms.h
diff -r 494be76c1ad9 tools/firmware/hvmloader/hvmloader.c
--- a/tools/firmware/hvmloader/hvmloader.c  Thu Aug 27 16:54:33 2009 +0800
+++ b/tools/firmware/hvmloader/hvmloader.c  Thu Aug 27 17:23:00 2009 +0800
@@ -688,9 +688,9 @@ int main(void)
         vgabios_sz = round_option_rom(sizeof(vgabios_stdvga));
         break;
     case VGA_pt:
-        printf("Loading Gfx Video BIOS from 0xC0000 ...\n");
-        vgabios_sz =
-            round_option_rom((*(uint8_t *)(VGABIOS_PHYSICAL_ADDRESS+2)) * 512);
+         printf("Loading Gfx Video BIOS from file ...\n");
+         memcpy((void *)VGABIOS_PHYSICAL_ADDRESS, vgabios_pt, sizeof(vgabios_pt));
+         vgabios_sz = round_option_rom(sizeof(vgabios_pt));
         break;
     default:
         printf("No emulated VGA adaptor ...\n");



Teo En Ming (Zhang Enming) wrote:
> Dear Weidong,
>
> I have read through the technical papers on VGA passthrough presented
> by the University of Amsterdam and the University of Michigan and
> thus gained a better understanding of VGA passthrough in Xen.
>
> For my preferred setup, I would not be able to adopt the guidance
> provided by the University of Amsterdam. I should adopt the guidance
> provided by the University of Michigan. I want dom0 to have access to
> Intel GMA 4500 onboard graphics while the Windows XP Home HVM domU to
> have access to PCI Express x16 graphics.
>
> I have taken the first step in examining the source code for
> disabling the Cirrus emulated vga card bootup in Xen 3.5-unstable.
>
> According to Beng Heng's UMICH technical paper, he referenced source
> file tools/ioemu-remote/hw/pc.c for disabling emulated vga device.
>
> I went to the xen-unstable mercurial reponsitory
> http://xenbits.xensource.com/xen-unstable.hg and found
> tools/ioemu/hw/pc.c
>
> Now, my objective is to disable the emulated/virtual vga card and
> load the bios of my nvidia geforce 8400 gs pcie x16 graphics card.
>
> Source code file
> http://xenbits.xensource.com/xen-unstable.hg?file/18b41609a980/tools/ioemu/h
> w/pc.c
>
> =========================================================================
>
> Line 862 says that if variable cirrus_vga_enabled returns true
> (having a value of 1), the cirrus vga bios VGABIOS_CIRRUS_FILENAME
> will be loaded. Otherwise, an alternative vga bios file
> VGABIOS_FILENAME will be loaded.
>
> On line 30 VGABIOS_FILENAME is defined.
>
> On line 31 VGABIOS_CIRRUS_FILENAME is defined.
>
> On line 24 a header file vl.h is referenced.
>
> Source code file
> http://xenbits.xensource.com/xen-unstable.hg?file/7eefe6399bcd/tools/ioemu/v
> l.h
>
> ===========================================================================
>
> On line 177 the integer variable cirrus_vga_enabled is declared but I
> could not find any line where this variable is initialized to a
> value. But I found file vl.c.
>
> Source code file
> http://xenbits.xensource.com/xen-unstable.hg?file/dade7f0bdc8d/tools/ioemu/v
> l.c
>
> ============================================================================
>
> On line 168 integer variable cirrus_vga_enabled is initialized to a
> value of
> 1. However, I want to disable cirrus vga and load my nvidia geforce
> 8400 gs vga bios.
>
> So I should change line 168 to "int cirrus_vga_enabled = 0;" in my
> case.
>
> Hence line 865 in file pc.c would be executed.
>
> After extracting the vga bios of my nvidia geforce 8400 gs pcie x16
> graphics card using the nvflash.exe utility and saving the firmware
> file as nv-gf-8400-gs.bin, I should place this bios image file in the
> xen-unstable sources directory on my PC. It should be in the same
> directory where files vgabios.bin and vgabios-cirrus.bin are found.
>
> Then I should change line 30 in file pc.c from '#define
> VGABIOS_FILENAME "vgabios.bin"' to '#define VGABIOS_FILENAME
> nv-gf-8400-gs.bin' and then recompile and install the source code.
>
> Besides disabling the emulated cirrus vga card, may I know what other
> things do I need to take care of? Does the pre-existing xen
> 3.5-unstable source code already can take care of re-executing the
> geforce 8400 gs vga bios, passing through the framebuffer and vga I/O
> ports, and intercepting the values in the host bridge such that I do
> not have to hack the other relevant source code files?
>
> Thank you very much.
>
> Regards,
>
> Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics Engineering)
> BEng(Hons)(Mechanical Engineering)
> Technical Support Engineer
> Information Technology Department
> Asiasoft Online Pte Ltd
> Tampines Central 1 #04-01 Tampines Plaza
> Singapore 529541
> Republic of Singapore
> Mobile: +65-9648-9798
> MSN: teoenming@hotmail.com
> Alma Maters: Singapore Polytechnic, National University of Singapore
>
> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Teo En
> Ming (Zhang Enming)
> Sent: Thursday, August 27, 2009 2:03 PM
> To: enming.teo@asiasoftsea.net; weidong.han@intel.com;
> djmagee@mageenet.net Cc: xen-devel@lists.xensource.com
> Subject: RE: [Xen-devel] nVidia Geforce 8400 GS PCI Express x16 VGA
> PassThrough to Windows XP Home 32-bit HVM Virtual Machine withIntel
> Desktop Board DQ45CB
>
> I think I cannot use NiBiTor already as I don't have any native
> Windows operating system running on my computer.
>
> Instead, I should create a CD-ROM boot disc that contains nvflash.exe
> and also loads USB drivers.
>
> I will use nvflash.exe to extract out and save VGA BIOS to USB sticks.
>
> NVIDIA GeForce BIOS Backup Guide:
> http://www.mvktech.net/content/view/14/37/
>
> Motherboard Flash Boot CD from Linux Mini HOWTO:
> http://www.nenie.org/misc/flashbootcd.html
>
> DOS USB Drivers: http://www.bootdisk.com/usb.htm
>
> Regards,
>
> Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics Engineering)
> BEng(Hons)(Mechanical Engineering)
> Technical Support Engineer
> Information Technology Department
> Asiasoft Online Pte Ltd
> Tampines Central 1 #04-01 Tampines Plaza
> Singapore 529541
> Republic of Singapore
> Mobile: +65-9648-9798
> MSN: teoenming@hotmail.com
> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Teo En
> Ming (Zhang Enming)
> Sent: Thursday, August 27, 2009 11:18 AM
> To: weidong.han@intel.com; djmagee@mageenet.net
> Cc: xen-devel@lists.xensource.com
> Subject: RE: [Xen-devel] nVidia Geforce 8400 GS PCI Express x16 VGA
> PassThrough to Windows XP Home 32-bit HVM Virtual Machine withIntel
> Desktop Board DQ45CB
>
> Dear Weidong,
>
> I will go for option 2.
>
>> don't initialize emulated VGA in qemu, I I think you have seen the
>> code.
>
> I have not seen the source code. Though I have learnt simple
> structured C before in Singapore Polytechnic some 11 years ago, I am
> not a programmer by profession. So I would still need some
> instructions on how I can prevent emulated VGA bios from loading and
> getting the vga bios of the real display card to load.
>
> Thank you.
>
> Regards,
>
> Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics Engineering)
> BEng(Hons)(Mechanical Engineering)
> Technical Support Engineer
> Information Technology Department
> Asiasoft Online Pte Ltd
> Tampines Central 1 #04-01 Tampines Plaza
> Singapore 529541
> Republic of Singapore
> Mobile: +65-9648-9798
> MSN: teoenming@hotmail.com
>
> -----Original Message-----
> From: xen-devel-bounces@lists.xensource.com
> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Han,
> Weidong Sent: Thursday, August 27, 2009 10:01 AM
> To: 'enming.teo@asiasoftsea.net'; 'djmagee@mageenet.net'
> Cc: 'xen-devel@lists.xensource.com'
> Subject: RE: [Xen-devel] nVidia Geforce 8400 GS PCI Express x16 VGA
> PassThrough to Windows XP Home 32-bit HVM Virtual Machine with
> IntelDesktop Board DQ45CB
>
> Teo En Ming (Zhang Enming) wrote:
>> Hi Weidong,
>>
>> Currently I am using the entire Xen 3.5-unstable branch and not using
>> the posted pci-stub patch changeset 19893 to patch Xen 3.4.1-testing.
>> I am using Xen 3.5-unstable in conjunction with pvops dom 0 kernel
>> 2.6.31-rc6.
>>
>> Here are my questions:
>>
>> 1) How do I extract out the VGA BIOS/firmware from the
>> firmware/EEPROM chip on the Geforce 8400 GS VGA card? If it is not
>> possible to extract, where can I download the firmware? NVIDIA does
>> not provide display card firmware/BIOS files for public download on
>> its website.
>
> There are 3 ways to load VGA bios to guest:
> 1. copy it from host 0xc0000, and map it to guest 0xc0000. This works
> for primary gfx card, XCI does like this. But some VGA bios
> re-execution doesn't work well.
> 2. dump VGA bios to a file first, then load it to guest: you can use
> NiBiTor to dump VGA bios of Nvidia gfx cards to a file, then load it
> into guest roms like what emulated VGA does in hvmloader. This works
> for all gfx cards, but it involves manual steps.
>
> 3. load VGA bios from expansion rom: Intel iGFX doesn't have
> expansion rom
>
>>
>> 2) How do I get the Geforce 8400 GS VGA BIOS to load instead of the
>> emulated VGA BIOS? Which configuration files/source codes do I have
>> to configure/patch?
>
> don't initialize emulated VGA in qemu, I I think you have seen the
> code.
>
>>
>> 3) What are physical MMIO, virtual MMIO Bars, dsdt.asl and 1:1 map? I
>> am sorry but I am not a graphics card hardware engineer so I do not
>> understand those terms. I would appreciate it if you could write out
>> some steps that I could carry out or point me to already available
>> documentation.
>
> When you use command 'lspci -v', you may see below messages under each
> device:
>
> ...
> Memory at d0000000 (32-bit, prefetchable) [size=256M]
> Memory at ff900000 (32-bit, non-prefetchable) [size=1M]
> ...
>
> They are MMIO BARs. 1:1 map means make these MMIO BARs in guest are
> equal to the values in host.
>
>>
>> 4) I don't mind using experimental code. Please post it on the
>> xen-devel mailing list so that I could try it out.
>
> we will post in near future. pls wait for a while.
>
> Regards,
> Weidong
>
>>
>> Thank you very much.
>>
>> Regards,
>>
>> Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics Engineering)
>> BEng(Hons)(Mechanical Engineering)
>> Technical Support Engineer
>> Information Technology Department
>> Asiasoft Online Pte Ltd
>> Tampines Central 1 #04-01 Tampines Plaza
>> Singapore 529541
>> Republic of Singapore
>> Mobile: +65-9648-9798
>> MSN: teoenming@hotmail.com
>>
>> -----Original Message-----
>> From: Han, Weidong [mailto:weidong.han@intel.com]
>> Sent: Wednesday, August 26, 2009 6:56 PM
>> To: 'enming.teo@asiasoftsea.net'; 'djmagee@mageenet.net'
>> Cc: 'xen-devel@lists.xensource.com'
>> Subject: RE: [Xen-devel] nVidia Geforce 8400 GS PCI Express x16 VGA
>> Pass Through to Windows XP Home 32-bit HVM Virtual Machine with
>> Intel Desktop Board DQ45CB
>>
>> Teo En Ming (Zhang Enming) wrote:
>>> Hi Weidong,
>>>
>>> Could you share with us the hack codes for making Geforce 8400 GS
>>> work and also how to let the Windows HVM guest boot up using the
>>> real BIOS of Geforce 8400 GS instead of an emulated VGA BIOS?
>>>
>>
>> What patch are you using now? Using real VGA bios of gfx card to
>> replace emulatd VGA bios is the prerequisite of gfx passthrough. You
>> can find it in posted gfx passthrough patches or XCI. For hack of
>> making Geforce 8400, we reserve physical MMIO BARs in dsdt.asl, and
>> make it 1:1 map between physical MMIO BARs and virtual MMIO BARs of
>> the card. Currently our code is experimental, we will send out in
>> mailing list after cleanup and more tests.
>>
>> Regards,
>> Weidong
>>
>>> Thank you.
>>>
>>> Regards,
>>>
>>> Mr. Teo En Ming (Zhang Enming) Dip(Mechatronics Engineering)
>>> BEng(Hons)(Mechanical Engineering)
>>> Technical Support Engineer
>>> Information Technology Department
>>> Asiasoft Online Pte Ltd
>>> Tampines Central 1 #04-01 Tampines Plaza
>>> Singapore 529541
>>> Republic of Singapore
>>> Mobile: +65-9648-9798
>>> MSN: teoenming@hotmail.com
>>> Alma Maters: Singapore Polytechnic, National University of Singapore
>>>
>>> -----Original Message-----
>>> From: xen-devel-bounces@lists.xensource.com
>>> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Han,
>>> Weidong Sent: Wednesday, August 26, 2009 4:27 PM
>>> To: 'djmagee@mageenet.net'; 'enming.teo@asiasoftsea.net'
>>> Cc: 'xen-devel@lists.xensource.com'
>>> Subject: RE: [Xen-devel] nVidia Geforce 8400 GS PCI Express x16
>>> VGAPassThroughto Windows XP Home 32-bit HVM Virtual Machine with
>>> IntelDesktop BoardDQ45CB
>>>
>>> I suppose you just use the patch posted in mailing list before.
>>> nVidia Geforce 8400 passthrough needs extra hacks. We can make it
>>> work in our experiments with 1:1 map of its MMIO BARs.
>>>
>>> We are working on gfx passthrough on latest xen-unstable. Firstly,
>>> we want to cook a simple patch including generic changes to support
>>> passthrough of virtualization friendly gfx cards, such as Nvidia
>>> FX3800. This patch is basically done. Then, we will add some hacks
>>> for more gfx cards passthrough, such as iGFX and some Nvidia and
>>> ATI cards.
>>>
>>> Regards,
>>> Weidong
>>>
>>> djmagee@mageenet.net wrote:
>>>> As I've pointed out on this list before, there are not enough PCIe
>>>> lanes on the DQ45CB to drive both the internal graphics adapter and
>>>> the add-on adapter at the same time.  I believe the onboard one may
>>>> be able to operate in some sort of VGA only mode when there is a
>>>> card installed and used as the primary adapter.
>>>>
>>>> I have a similar setup, using the same motherboard, and an ATI
>>>> 4770. I used the VGA passthrough patches from an earlier posting
>>>> (I saw you followed the same set of instructions), with limited
>>>> success. I've been using 3.4-testing, and 'xenified' 2.6.29.6
>>>> kernel.  I made modifications to the dom0 portion of the patches
>>>> so they would apply to my xenified 2.6.29 kernel.  These patches
>>>> include code that will copy the VGA bios to the guest bios instead
>>>> of using the emulated vga bios.  I've had very little success,
>>>> however.  I have only tried passing through the ATI adapter.  In
>>>> all instances, the guest bios messages appear on my monitor, so
>>>> this much works.  In some cases, the guest essentially stops
>>>> there; xm list show's about 2sec CPU usage and nothing ever
>>>> happens after that point.  In other cases, the guest (win
>>>> xp/vista/7, as well as KNOPPIX 5.3.1 DVD) will boot all the way,
>>>> but in very low res/color mode, and cannot properly initialize the
>>>> video device.  Once or twice, it actually did recognize the device
>>>> and had a reasonable default color/resolution combination.  In all
>>>> cases where the guest actually boots, the system eventually
>>>> freezes.  In some cases, I get endless streams of iommu page
>>>> faults.
>>>>
>>>> I have 8GB ram installed.  In all cases I've limited dom0 memory to
>>>> 2GB.  In all cases, my guest has been assigned 2GB of memory.
>>>>
>>>> I have a dual core e6600.  I've tried allowing dom0 to use both
>>>> cores, offlining one core (using xend dom0_vcpu setting) after
>>>> boot, and restricting dom0 to only one core using the
>>>> dom0_max_vcpus xen hypervisor parameter.  In all of these cases
>>>> I've tried both one and two vcpus for the guest.  My success with
>>>> VGA passthrough seems somewhat random and no combination of cpu
>>>> assignment seems to have any effect.
>>>>
>>>> I have not tried with the 2.6.18-xen kernel as I haven't gotten it
>>>> to boot on my hardware; it can never find my volume group, even if
>>>> I create a initrd with all of the required modules, or build those
>>>> drivers into the kernel.  I have not spent more than maybe a half
>>>> an hour on this problem; I suspect it may have something to do
>>>> with the version of mkinitrd I'm using (from Fedora 9 x64).
>>>>
>>>> If anyone else has any insight or similar experience I'd also love
>>>> to hear it.
>>>>
>>>> Doug Magee
>>>> djmagee@mageenet.com
>>>>
>>>> -----Original Message-----
>>>> From: xen-devel-bounces@lists.xensource.com
>>>> [mailto:xen-devel-bounces@lists.xensource.com] On Behalf Of Mr. Teo
>>>> En Ming (Zhang Enming) Sent: Tuesday, August 25, 2009 11:57 AM
>>>> To: enming.teo@asiasoftsea.net
>>>> Cc: xen-devel@lists.xensource.com
>>>> Subject: Re: [Xen-devel] nVidia Geforce 8400 GS PCI Express x16 VGA
>>>> PassThroughto Windows XP Home 32-bit HVM Virtual Machine with
>>>> Intel Desktop BoardDQ45CB
>>>>
>>>> I have uninstalled Xen 3.4.1 and installed Xen 3.5-unstable as
>>>> suggested by Weidong.
>>>>
>>>>
>>>> On 08/25/2009 11:47 PM, Mr. Teo En Ming (Zhang Enming) wrote:
>>>>> Dear All,
>>>>>
>>>>> I have managed to do PCI-e VGA passthrough with the open source
>>>>> Xen but the work is still in progress because although Windows XP
>>>>> guest can see the REAL PCI-e x16 graphics card instead of an
>>>>> emulated graphics driver, it cannot be initialized yet.
>>>>>
>>>>> Thanks to Intel Engineer Han Weidong, Pasi Kärkkäinen, Boris
>>>>> Derzhavets, Marc, Caz Yokoyama, and others who have helped me and
>>>>> shared their knowledge with me along the way.
>>>>>
>>>>> System Configuration:
>>>>>
>>>>> Intel Desktop Board DQ45CB with BIOS upgraded to 0093
>>>>> Onboard Intel GMA 4500 Graphics (IGD)
>>>>> nVidia Geforce 8400 GS PCI Express x16 Graphics Card
>>>>>
>>>>> Fedora 11 Linux 64-bit Xen paravirt operations Domain 0 Host
>>>>> Operating System Xen 3.5 Unstable/Development Type 1 Hypervisor
>>>>> Jeremy Fitzhardinge's Xen paravirt-ops domain 0 Kernel 2.6.31-rc6
>>>>> Primary Video Adapter in BIOS: IGD
>>>>>
>>>>> Please see the screenshots and my blog at the link here:
>>>>>
>>>>>
>>>
>>
> http://teo-en-ming-aka-zhang-enming.blogspot.com/2009/08/nvidia-geforce-8400
>>> -gs-pci-express-x16.html
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>>
>>> No virus found in this incoming message.
>>> Checked by AVG - www.avg.com
>>> Version: 8.5.392 / Virus Database: 270.13.65/2324 - Release Date:
>>> 08/25/09 18:07:00
>>>
>>> No virus found in this outgoing message.
>>> Checked by AVG - www.avg.com
>>> Version: 8.5.392 / Virus Database: 270.13.65/2324 - Release Date:
>>> 08/25/09 18:07:00
>>
>>
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 8.5.392 / Virus Database: 270.13.65/2324 - Release Date:
>> 08/25/09 18:07:00
>>
>> No virus found in this outgoing message.
>> Checked by AVG - www.avg.com
>> Version: 8.5.392 / Virus Database: 270.13.65/2324 - Release Date:
>> 08/25/09 18:07:00
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.392 / Virus Database: 270.13.67/2326 - Release Date:
> 08/25/09 18:07:00
>
> No virus found in this outgoing message.
> Checked by AVG - www.avg.com
> Version: 8.5.392 / Virus Database: 270.13.67/2326 - Release Date:
> 08/26/09 12:16:00
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.392 / Virus Database: 270.13.69/2328 - Release Date:
> 08/26/09 12:16:00
>
> No virus found in this outgoing message.
> Checked by AVG - www.avg.com
> Version: 8.5.392 / Virus Database: 270.13.69/2328 - Release Date:
> 08/26/09 12:16:00
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.392 / Virus Database: 270.13.69/2328 - Release Date:
> 08/26/09 12:16:00
>
> No virus found in this outgoing message.
> Checked by AVG - www.avg.com
> Version: 8.5.392 / Virus Database: 270.13.69/2328 - Release Date:
> 08/26/09 12:16:00

^ permalink raw reply

* RE: write_tsc in a PV domain?
From: Dan Magenheimer @ 2009-08-28  3:29 UTC (permalink / raw)
  To: Jeremy Fitzhardinge, Alan Cox; +Cc: Xen-Devel (E-mail), Keir Fraser
In-Reply-To: <4A96DA35.2020109@goop.org>

> On 08/27/09 01:48, Alan Cox wrote:
> >> as part of its ABI.  It is not a general tsc.  You can't 
> meaningfully
> >> execute "rdtsc" without also being (indirectly) aware of 
> what pcpu its
> >> running on and applying the appropriate corrections to turn it into
> >> system monotonic time.  Executing rdtsc willy-nilly gets 
> you useless
> >> results; fortunately no PV Xen kernel does that.
> >>     
> > Actually for user space this isn't at all true. You can use rdtsc
> > directly and sample the data for things like profiling then 
> correct for
> > things like spikes and skews from processor switches by filtering. 
> 
> If an app is sophisticated to do this correctly then it 
> doesn't need any
> special assistance from a hypervisor to make the tsc well-behaved.  It
> should continue to work even in a Xen guest where both the process can
> skip between VCPUs and the VCPUs can skip between PCPUs.

No, I don't think this is true.  An enterprise app that binds processes
to fixed physical processors on a physical machine can make
assumptions about the results of rdtsc that aren't valid when
the vcpus can skip between pcpus.  Further, like Linux itself,
applications may test assumptions about tsc at startup that are
assumed to remain valid for the life of the app, which is
perfectly reasonable on a physical machine and a bad mistake
in a virtualized environment.

> >> No, write_tsc is meaningless, and anyone trying to execute 
> it is not
> >> even wrong.
> >>     
> > Writing to the tsc is perfectly reasonable providing the tsc is an
> > advertised feature. Being able to use the tsc becomes much 
> more relevant
> > with newer processors which have sane tsc implementations in the
> > architecture however.
> 
> Apparently on some large servers the tsc is only synced and 
> sane within
> a NUMA node, and not globally across all processors, so any app which
> assumed sane tsc behaviour would break when the hardware gets 
> scaled up.

True, but any app that tries to run on a NUMA machine without
being aware of the idiosyncracies of a NUMA machine probably
has worse problems to deal with than tsc sync.  Further, there
are many many apps that will likely never ever run on those
machines.  Are we going to penalize all apps all the time
because some might run some of the time on a machine where
tsc is not synced?

> But in this case I'm talking specifically about a Xen PV guest, where
> the tsc is claimed for use by the Xen clocksource ABI.

I just don't understand how you can say that a valid userland
instruction is "claimed for use" by Xen (or Linux or both).

^ permalink raw reply


This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.