* lvm and locales memory issue
@ 2010-02-18 9:19 Zdenek Kabelac
2010-02-18 12:48 ` Milan Broz
0 siblings, 1 reply; 22+ messages in thread
From: Zdenek Kabelac @ 2010-02-18 9:19 UTC (permalink / raw)
To: lvm-devel
Hi
As we discussed yesterday at the confcall the problem with mlockall() and way
we cannot easily disable locales, because of posix complaince and
internationalized error message.
How about adding some global{} configure option ?
e.i.: "use_plain_C_locales = 0/1"
defaulting to 0.
Anaconda and eventually system admin who cares about memory might easily
enable it by switching to 1 - it could be slightly more confortable then
writing wrapper scripts around lvm command for the same thing.
The question is how we implement this - either we delay activation of
setlocale until we read config - or we just call setlocale("C"...) just
before mlockall() if this is set by configuration?
Any better idea?
Zdenek
^ permalink raw reply [flat|nested] 22+ messages in thread* lvm and locales memory issue 2010-02-18 9:19 lvm and locales memory issue Zdenek Kabelac @ 2010-02-18 12:48 ` Milan Broz 2010-02-18 13:26 ` Zdenek Kabelac 0 siblings, 1 reply; 22+ messages in thread From: Milan Broz @ 2010-02-18 12:48 UTC (permalink / raw) To: lvm-devel On 02/18/2010 10:19 AM, Zdenek Kabelac wrote: > As we discussed yesterday at the confcall the problem with mlockall() and way > we cannot easily disable locales, because of posix complaince and > internationalized error message. So here is a solution. But what's the problem? :-) For the kindly reader of lvm-devel not attending confcall(s), please can you describe what's the problem you are trying to solve? Is it glibc locale handling bug? Or lvm uses locales the wrong way? Which other programs are affected too? Several programs use mlockall() (e.g. I am doing the same operation as lvm in cryptsetup), some of them use locales, some not. Even some libraries can lock memory (gcrypt & safe pool allocation). > How about adding some global{} configure option ? > > e.i.: "use_plain_C_locales = 0/1" I couldn't resist but this seems to me like We_have_no_idea_what_is_going_on_but_setting_this_to_zero_decreases_memory_use = 0 ;-) Milan ^ permalink raw reply [flat|nested] 22+ messages in thread
* lvm and locales memory issue 2010-02-18 12:48 ` Milan Broz @ 2010-02-18 13:26 ` Zdenek Kabelac 2010-02-19 16:11 ` Zdenek Kabelac 0 siblings, 1 reply; 22+ messages in thread From: Zdenek Kabelac @ 2010-02-18 13:26 UTC (permalink / raw) To: lvm-devel On 18.2.2010 13:48, Milan Broz wrote: > On 02/18/2010 10:19 AM, Zdenek Kabelac wrote: >> As we discussed yesterday at the confcall the problem with mlockall() and way >> we cannot easily disable locales, because of posix complaince and >> internationalized error message. > > So here is a solution. But what's the problem? :-) > > For the kindly reader of lvm-devel not attending confcall(s), > please can you describe what's the problem you are trying to solve? > > Is it glibc locale handling bug? Or lvm uses locales the wrong way? > Which other programs are affected too? > > Several programs use mlockall() (e.g. I am doing the same operation > as lvm in cryptsetup), some of them use locales, some not. > Even some libraries can lock memory (gcrypt & safe pool allocation). > >> How about adding some global{} configure option ? >> >> e.i.: "use_plain_C_locales = 0/1" > > I couldn't resist but this seems to me like > We_have_no_idea_what_is_going_on_but_setting_this_to_zero_decreases_memory_use = 0 ;-) > Well we know quite well what is going on here, but the solution is a bit of problem we are not sure about. :) Please check the bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=553193 The problem is - we should support locales - to give localized system error messages to the user - thus just purely disabling locales by not enabling it is not an option - we cannot be smarter then user and avoid him getting nice i.e. Czech error message about running out-of-memory if he wants to. Some distributions supports customized locale-archive generation - thus the total size of memory taken by this file is not approaching 100MB like current Fedora does - so if the administrator selected only few locales - typically english and his native language - the size is within few MB - so not a real problem anymore. However e.g. on Fedora you can't simply create a smaller file - thus running command like lvm in fact puts 100MB file in memory - quite fast and unnoticable on 2GB machine with SSD driver - but a bit of problem of limited memory sized kvm/xem guest... We are trying to search for the solution. Some of them requires modification on else's userspace code - and some of them might lead to modification of our code. Glibc most probably cannot avoid mmaping whole file into a system memory once it asked to do so - as user could change locales in mlockall() state and this would lead to disk read operation. This actually remainds me - if glibc could switch locales anytime - why it does not load this locale-archive file in case of C locales. Is this 'bug or feature' ? I think there is no big chance to have personalized generated local files anytime soon in distribution and wouldn't help the problem with multilingual installations all that much (i.e. Anaconda). That's the state and we try to look out for a solution how to use locales and mlockall() programs together. Zdenek ^ permalink raw reply [flat|nested] 22+ messages in thread
* lvm and locales memory issue 2010-02-18 13:26 ` Zdenek Kabelac @ 2010-02-19 16:11 ` Zdenek Kabelac 2010-02-19 16:30 ` Alasdair G Kergon 0 siblings, 1 reply; 22+ messages in thread From: Zdenek Kabelac @ 2010-02-19 16:11 UTC (permalink / raw) To: lvm-devel On 18.2.2010 14:26, Zdenek Kabelac wrote: > On 18.2.2010 13:48, Milan Broz wrote: >> On 02/18/2010 10:19 AM, Zdenek Kabelac wrote: >>> As we discussed yesterday at the confcall the problem with mlockall() and way >>> we cannot easily disable locales, because of posix complaince and >>> internationalized error message. >> >> So here is a solution. But what's the problem? :-) >> >> For the kindly reader of lvm-devel not attending confcall(s), >> please can you describe what's the problem you are trying to solve? >> >> Is it glibc locale handling bug? Or lvm uses locales the wrong way? >> Which other programs are affected too? >> >> Several programs use mlockall() (e.g. I am doing the same operation >> as lvm in cryptsetup), some of them use locales, some not. >> Even some libraries can lock memory (gcrypt & safe pool allocation). >> >>> How about adding some global{} configure option ? >>> >>> e.i.: "use_plain_C_locales = 0/1" >> >> I couldn't resist but this seems to me like >> We_have_no_idea_what_is_going_on_but_setting_this_to_zero_decreases_memory_use = 0 ;-) >> > > Well we know quite well what is going on here, but the solution is a bit of > problem we are not sure about. :) > > Please check the bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=553193 > > The problem is - we should support locales - to give localized system error > messages to the user - thus just purely disabling locales by not enabling it > is not an option - we cannot be smarter then user and avoid him getting nice > i.e. Czech error message about running out-of-memory if he wants to. Ok After some discussion with Jakub - here is the outcome of our chat: There are no plans to change anything on the glibc side about handling locales - locale-archive is an optimization which seems to share a lot of common tables between various locales in one place. The size of separate locale files would probably take several times more as much space as the current large single file version. Once the file is mapped (locales set) - it stays in - we cannot switch back to 'C' with the hope, the locale-archive would be unmapped. As a side note - there are also other potentially huge files mmaped with glibc - i.e. nscd database. Jakub suggest solution to use a tiny small forked() process with very limited funcionality to handle mlockall() task without any usage of locales. Eventually error reports, debugs and other things should be handled in surrounding environment - just like we have discussed already as a one potential solution. So please feel free to add any comment. Zdenek ^ permalink raw reply [flat|nested] 22+ messages in thread
* lvm and locales memory issue 2010-02-19 16:11 ` Zdenek Kabelac @ 2010-02-19 16:30 ` Alasdair G Kergon 2010-02-22 10:55 ` Zdenek Kabelac 0 siblings, 1 reply; 22+ messages in thread From: Alasdair G Kergon @ 2010-02-19 16:30 UTC (permalink / raw) To: lvm-devel On Fri, Feb 19, 2010 at 05:11:09PM +0100, Zdenek Kabelac wrote: > Jakub suggest solution to use a tiny small forked() process with very limited > funcionality to handle mlockall() task without any usage of locales. Nack. It's not tiny - it's most of the code - and it's a *state* of usage of the code - a way of using the *same* code as we also use without mlockall at other times - the code knows to behave slightly differently depending whether or not it is in this state. Surely most people don't need most of the locales - they should be able to choose which ones to cache in the archive file - the ones they regularly use - and take the performance hit on the occasions (if any) they want to use the non-cached ones. I notice the code to build the archive is in a 'fedora' subdir in the spec file with hard-coded pathnames. Is there an 'upstream' approach here or do different distributions handle this differently and if so why? > Eventually error reports, debugs and other things should be handled in > surrounding environment - just like we have discussed already as a one > potential solution. Opposite way around. The hack would be to push all the things that attempt to access that locale archive file out into a subprocess. That would first require an audit of all the the glibc functions we use to determine which ones can attempt to access that file. Then we'd have to place wrappers around those functions to push them into a subprocess and avoid using them synchronously when in this 'mlocked' state. Alasdair ^ permalink raw reply [flat|nested] 22+ messages in thread
* lvm and locales memory issue 2010-02-19 16:30 ` Alasdair G Kergon @ 2010-02-22 10:55 ` Zdenek Kabelac 2010-02-22 13:16 ` Zdenek Kabelac 0 siblings, 1 reply; 22+ messages in thread From: Zdenek Kabelac @ 2010-02-22 10:55 UTC (permalink / raw) To: lvm-devel On 19.2.2010 17:30, Alasdair G Kergon wrote: > On Fri, Feb 19, 2010 at 05:11:09PM +0100, Zdenek Kabelac wrote: >> Jakub suggest solution to use a tiny small forked() process with very limited >> funcionality to handle mlockall() task without any usage of locales. > > Nack. It's not tiny - it's most of the code - and it's a *state* of usage of > the code - a way of using the *same* code as we also use without mlockall at > other times - the code knows to behave slightly differently depending whether > or not it is in this state. > > Surely most people don't need most of the locales - they should be able to > choose which ones to cache in the archive file - the ones they regularly use - > and take the performance hit on the occasions (if any) they want to use the > non-cached ones. > > I notice the code to build the archive is in a 'fedora' subdir in the spec file > with hard-coded pathnames. Is there an 'upstream' approach here or do different > distributions handle this differently and if so why? > >> Eventually error reports, debugs and other things should be handled in >> surrounding environment - just like we have discussed already as a one >> potential solution. > > Opposite way around. The hack would be to push all the things that attempt to > access that locale archive file out into a subprocess. That would first require > an audit of all the the glibc functions we use to determine which ones can attempt > to access that file. Then we'd have to place wrappers around those functions > to push them into a subprocess and avoid using them synchronously when in this > 'mlocked' state. > Here is outcome of another chat with Jakub: It looks like there is no easy way to modify API of glibc anytime soon in the near future and this change would have to go through Uli first... (there seems to be things like 'newlocale()' which make things more complicated) As the workaround for 'memory-limited' environments here is suggested workadound. remove cache file: 'rm -f /usr/lib/locale/locale-archive' and create just locales you need to use: 'localedef -f UTF-8 -i cs_CZ /usr/lib/locale/cs_CZ.utf8' (In this case small separate files in the directory: "/usr/lib/locale/cs_CZ.utf8" are create and opened during application runtime - this is probably less effiecient then this second way: or eventually recreate /usr/lib/locale/locale-archive with this command: 'localedef -i cs_CZ -c -f UTF-8 -A /usr/share/locale/locale.alias cs_CZ.UTF-8' (In this case only .5MB file is generated - for adding another locales - just another call with localedef is needed - i.e. en_US, de_DE, cs_CZ will lead to ~3MB files at my current installation. Memory footprint for just cs_CZ test case is - i.e. main() { setlocale(); mloockall{} } is approximately 4MB - compared to 100MB with normal - full locale-archive. Zdenek ^ permalink raw reply [flat|nested] 22+ messages in thread
* lvm and locales memory issue 2010-02-22 10:55 ` Zdenek Kabelac @ 2010-02-22 13:16 ` Zdenek Kabelac 2010-02-22 18:11 ` Alasdair G Kergon 0 siblings, 1 reply; 22+ messages in thread From: Zdenek Kabelac @ 2010-02-22 13:16 UTC (permalink / raw) To: lvm-devel On 22.2.2010 11:55, Zdenek Kabelac wrote: > On 19.2.2010 17:30, Alasdair G Kergon wrote: >> On Fri, Feb 19, 2010 at 05:11:09PM +0100, Zdenek Kabelac wrote: >>> Jakub suggest solution to use a tiny small forked() process with very limited >>> funcionality to handle mlockall() task without any usage of locales. >> >> Nack. It's not tiny - it's most of the code - and it's a *state* of usage of >> the code - a way of using the *same* code as we also use without mlockall at >> other times - the code knows to behave slightly differently depending whether >> or not it is in this state. >> >> Surely most people don't need most of the locales - they should be able to >> choose which ones to cache in the archive file - the ones they regularly use - >> and take the performance hit on the occasions (if any) they want to use the >> non-cached ones. >> >> I notice the code to build the archive is in a 'fedora' subdir in the spec file >> with hard-coded pathnames. Is there an 'upstream' approach here or do different >> distributions handle this differently and if so why? >> >>> Eventually error reports, debugs and other things should be handled in >>> surrounding environment - just like we have discussed already as a one >>> potential solution. >> >> Opposite way around. The hack would be to push all the things that attempt to >> access that locale archive file out into a subprocess. That would first require >> an audit of all the the glibc functions we use to determine which ones can attempt >> to access that file. Then we'd have to place wrappers around those functions >> to push them into a subprocess and avoid using them synchronously when in this >> 'mlocked' > > state. >> > > Here is outcome of another chat with Jakub: > > It looks like there is no easy way to modify API of glibc anytime soon in the > near future and this change would have to go through Uli first... > (there seems to be things like 'newlocale()' which make things more complicated) > > As the workaround for 'memory-limited' environments here is suggested workadound. > > remove cache file: > 'rm -f /usr/lib/locale/locale-archive' > > and create just locales you need to use: > 'localedef -f UTF-8 -i cs_CZ /usr/lib/locale/cs_CZ.utf8' > (In this case small separate files in the directory: > "/usr/lib/locale/cs_CZ.utf8" are create and opened during application runtime > - this is probably less effiecient then this second way: > > or eventually recreate /usr/lib/locale/locale-archive with this command: > 'localedef -i cs_CZ -c -f UTF-8 -A /usr/share/locale/locale.alias cs_CZ.UTF-8' > (In this case only .5MB file is generated - for adding another locales - just > another call with localedef is needed - i.e. en_US, de_DE, cs_CZ will lead to > ~3MB files at my current installation. > > Memory footprint for just cs_CZ test case is - i.e. main() { setlocale(); > mloockall{} } is approximately 4MB - compared to 100MB with normal - full > locale-archive. > To initiate some solution for our anaconda problems after some discussion with anaconda team member I've created this bugzilla (as he wished) https://bugzilla.redhat.com/show_bug.cgi?id=567252 Zdenek ^ permalink raw reply [flat|nested] 22+ messages in thread
* lvm and locales memory issue 2010-02-22 13:16 ` Zdenek Kabelac @ 2010-02-22 18:11 ` Alasdair G Kergon 2010-02-22 18:23 ` Jakub Jelinek 0 siblings, 1 reply; 22+ messages in thread From: Alasdair G Kergon @ 2010-02-22 18:11 UTC (permalink / raw) To: lvm-devel On Mon, Feb 22, 2010 at 02:16:38PM +0100, Zdenek Kabelac wrote: > On 22.2.2010 11:55, Zdenek Kabelac wrote: > > 'rm -f /usr/lib/locale/locale-archive' > > 'localedef -f UTF-8 -i cs_CZ /usr/lib/locale/cs_CZ.utf8' > > 'localedef -i cs_CZ -c -f UTF-8 -A /usr/share/locale/locale.alias cs_CZ.UTF-8' %attr(0644,root,root) %verify(not md5 size mtime mode) %ghost %config(missingok,noreplace) %{_prefix}/lib/locale/locale-archive So removing/changing that file is a fully-supported process? Perhaps anaconda should automatically remove it (if it has not been customised) on any system with < 640MB RAM? Or, better, the RPM could be updated not to install it on low-memory systems. Alasdair ^ permalink raw reply [flat|nested] 22+ messages in thread
* lvm and locales memory issue 2010-02-22 18:11 ` Alasdair G Kergon @ 2010-02-22 18:23 ` Jakub Jelinek 2010-02-22 18:51 ` Alasdair G Kergon 2010-02-23 8:52 ` Zdenek Kabelac 0 siblings, 2 replies; 22+ messages in thread From: Jakub Jelinek @ 2010-02-22 18:23 UTC (permalink / raw) To: lvm-devel On Mon, Feb 22, 2010 at 06:11:50PM +0000, Alasdair G Kergon wrote: > On Mon, Feb 22, 2010 at 02:16:38PM +0100, Zdenek Kabelac wrote: > > On 22.2.2010 11:55, Zdenek Kabelac wrote: > > > 'rm -f /usr/lib/locale/locale-archive' > > > 'localedef -f UTF-8 -i cs_CZ /usr/lib/locale/cs_CZ.utf8' > > > 'localedef -i cs_CZ -c -f UTF-8 -A /usr/share/locale/locale.alias cs_CZ.UTF-8' > > %attr(0644,root,root) %verify(not md5 size mtime mode) %ghost %config(missingok,noreplace) %{_prefix}/lib/locale/locale-archive > > So removing/changing that file is a fully-supported process? Of course not. The reason it has these flags is for glibc upgrading purposes. glibc-common rpm ships with locale-archive.tmpl file, and %post merges all locales from that file with any possible user added locales in locale-archive into a new locale-archive, the *.tmpl file is then deleted. > Perhaps anaconda should automatically remove it (if it has not been customised) on > any system with < 640MB RAM? If you delete the file, you loose all localization, because we don't ship the individual /usr/lib/locale/*_*/* locale files for space reasons. The same effect as if you don't call setlocale@all, or just with "C" in all apps. > Or, better, the RPM could be updated not to install it on low-memory systems. That's a bad idea. Jakub ^ permalink raw reply [flat|nested] 22+ messages in thread
* lvm and locales memory issue 2010-02-22 18:23 ` Jakub Jelinek @ 2010-02-22 18:51 ` Alasdair G Kergon 2010-02-22 19:05 ` Jakub Jelinek 2010-02-23 8:52 ` Zdenek Kabelac 1 sibling, 1 reply; 22+ messages in thread From: Alasdair G Kergon @ 2010-02-22 18:51 UTC (permalink / raw) To: lvm-devel On Mon, Feb 22, 2010 at 07:23:01PM +0100, Jakub Jelinek wrote: > That's a bad idea. So we're still no closer to a solution. So far the only realistic workaround I can think of for Fedora is: At lvm startup, check the size of the locale-archive file and the amount of memory on the system. If it looks critical, issue a warning and disable locale-setting in LVM. (Which other distros are affected as well as Fedora/RHEL/CentOS?) This workaround still does not address the fundamental problem: glibc is holding open a large mmaped file on behalf of the process and seemingly offering no way for the process to instruct it to close it without exiting. Alasdair ^ permalink raw reply [flat|nested] 22+ messages in thread
* lvm and locales memory issue 2010-02-22 18:51 ` Alasdair G Kergon @ 2010-02-22 19:05 ` Jakub Jelinek 0 siblings, 0 replies; 22+ messages in thread From: Jakub Jelinek @ 2010-02-22 19:05 UTC (permalink / raw) To: lvm-devel On Mon, Feb 22, 2010 at 06:51:00PM +0000, Alasdair G Kergon wrote: > This workaround still does not address the fundamental problem: > glibc is holding open a large mmaped file on behalf of the process and > seemingly offering no way for the process to instruct it to close it without > exiting. Guess we disagree what is the fundamental problem. IMHO it is a big app calling mlockall after it has called lots of functions from lots of libraries. mlockall really is meant to be used only in very small, tightly controlled apps, requiring every library function to be mlockall aware and have some API for flush before mlockall is unrealistical. Jakub ^ permalink raw reply [flat|nested] 22+ messages in thread
* lvm and locales memory issue 2010-02-22 18:23 ` Jakub Jelinek 2010-02-22 18:51 ` Alasdair G Kergon @ 2010-02-23 8:52 ` Zdenek Kabelac 2010-02-23 9:15 ` Jakub Jelinek 2010-02-23 15:17 ` Zdenek Kabelac 1 sibling, 2 replies; 22+ messages in thread From: Zdenek Kabelac @ 2010-02-23 8:52 UTC (permalink / raw) To: lvm-devel On 22.2.2010 19:23, Jakub Jelinek wrote: > On Mon, Feb 22, 2010 at 06:11:50PM +0000, Alasdair G Kergon wrote: >> On Mon, Feb 22, 2010 at 02:16:38PM +0100, Zdenek Kabelac wrote: >>> On 22.2.2010 11:55, Zdenek Kabelac wrote: >>>> 'rm -f /usr/lib/locale/locale-archive' >>>> 'localedef -f UTF-8 -i cs_CZ /usr/lib/locale/cs_CZ.utf8' >>>> 'localedef -i cs_CZ -c -f UTF-8 -A /usr/share/locale/locale.alias cs_CZ.UTF-8' >> >> %attr(0644,root,root) %verify(not md5 size mtime mode) %ghost %config(missingok,noreplace) %{_prefix}/lib/locale/locale-archive >> >> So removing/changing that file is a fully-supported process? > > Of course not. The reason it has these flags is for glibc upgrading > purposes. glibc-common rpm ships with locale-archive.tmpl file, and %post > merges all locales from that file with any possible user added locales in > locale-archive into a new locale-archive, the *.tmpl file is then deleted. > >> Perhaps anaconda should automatically remove it (if it has not been customised) on >> any system with < 640MB RAM? > > If you delete the file, you loose all localization, because we don't ship > the individual /usr/lib/locale/*_*/* locale files for space reasons. > The same effect as if you don't call setlocale at all, or just with "C" in > all apps. > Ok - and now I'm getting confused and lost here. >From our chat I've got impression that using 'localedef' is perfectly valid way how to create usable content for /usr/lib/locale. On my Fedora Rawhide system I've /usr/share/i18n/locales 6MB and /usr/share/locale/cs 12MB, that contains amongst other things 128KB libc.mo file and a lot of other files. >From my simple test program I do get valid Czech locale error messages and properly localized strftime() output from glibc calls in the case I recreate /usr/lib/local/locale-archive with 'localedef' command above. So what is the purpose of /usr/share/i18n/local, /usr/share/locale in this case? What do I miss in case the local-archive.tmpl file is not in used? Is the Czech locale special and there are some some other locales which could not be easily recreated? (btw it takes 1.3sec to create 1 Czech locale-archive, thus it looks like for 200 locales it could take maybe 4minutes in case of complete full recreate of the locale-archive file) It seems to me that my glibc-commons contains all files needed to create usable locale-archive even without locale-archive.tmpl - am I missing something here? >From strace it looks like only the content of /usr/share/i18n/locales does matter and it translates files in string form to binary form. Files from /usr/share/locale are opened runtime when needed by application. Thus I'm quite curios why the file /usr/lib/locale/locale-archive is actually opened for the case that only LC_MESSAGES is set to some locales. IMHO for this only files form /usr/share/locale should matter - I could assume it's because of the aliasing handling which is also hidden inside cached binary files - but it's pretty overkill isn't? Zdenek ^ permalink raw reply [flat|nested] 22+ messages in thread
* lvm and locales memory issue 2010-02-23 8:52 ` Zdenek Kabelac @ 2010-02-23 9:15 ` Jakub Jelinek 2010-02-23 9:14 ` Zdenek Kabelac 2010-02-23 13:07 ` Zdenek Kabelac 2010-02-23 15:17 ` Zdenek Kabelac 1 sibling, 2 replies; 22+ messages in thread From: Jakub Jelinek @ 2010-02-23 9:15 UTC (permalink / raw) To: lvm-devel On Tue, Feb 23, 2010 at 09:52:19AM +0100, Zdenek Kabelac wrote: > Ok - and now I'm getting confused and lost here. > > >From our chat I've got impression that using 'localedef' is perfectly valid > way how to create usable content for /usr/lib/locale. True, but very costly one. time for i in `cat /tmp/SUPPORTED`; do j=`echo $i | sed 's,/.*$,,'`; k=`echo $i | sed 's,^.*/,,'`; l=`echo $j | sed 's/\..*$//'`; localedef -A /usr/share/locale/locale.alias --no-archive -f $k -i $l -c /tmp/nyy/$j; done real 6m12.985s user 5m34.818s sys 0m33.134s Do you seriously suggest that we spend 6 minutes on very fast machines during glibc-common upgrades? You must be joking. Jakub ^ permalink raw reply [flat|nested] 22+ messages in thread
* lvm and locales memory issue 2010-02-23 9:15 ` Jakub Jelinek @ 2010-02-23 9:14 ` Zdenek Kabelac 2010-02-23 9:45 ` Jakub Jelinek 2010-02-23 13:07 ` Zdenek Kabelac 1 sibling, 1 reply; 22+ messages in thread From: Zdenek Kabelac @ 2010-02-23 9:14 UTC (permalink / raw) To: lvm-devel On 23.2.2010 10:15, Jakub Jelinek wrote: > On Tue, Feb 23, 2010 at 09:52:19AM +0100, Zdenek Kabelac wrote: >> Ok - and now I'm getting confused and lost here. >> >> >From our chat I've got impression that using 'localedef' is perfectly valid >> way how to create usable content for /usr/lib/locale. > > True, but very costly one. > > time for i in `cat /tmp/SUPPORTED`; do j=`echo $i | sed 's,/.*$,,'`; k=`echo $i | sed 's,^.*/,,'`; l=`echo $j | sed 's/\..*$//'`; localedef -A /usr/share/locale/locale.alias --no-archive -f $k -i $l -c /tmp/nyy/$j; done > > real 6m12.985s > user 5m34.818s > sys 0m33.134s > > Do you seriously suggest that we spend 6 minutes on very fast machines > during glibc-common upgrades? You must be joking. Well update of my rawhide usually more then 3/4 hour - so 6 minutes running in background - that's really nothing. And quite frankly - during the update you need to update/recompile only changed files - you could copy compiled & unchanged data to new file - thus in fact it would takes couple seconds - unless each glibc update changes all i18n locale definition, I doubt that - isn't that what the locale-archive.tmpl is already doing? (And as a bonus you save package size as you don't need to store tmpl inside) Zdenek ^ permalink raw reply [flat|nested] 22+ messages in thread
* lvm and locales memory issue 2010-02-23 9:14 ` Zdenek Kabelac @ 2010-02-23 9:45 ` Jakub Jelinek 2010-02-23 10:12 ` Zdenek Kabelac 0 siblings, 1 reply; 22+ messages in thread From: Jakub Jelinek @ 2010-02-23 9:45 UTC (permalink / raw) To: lvm-devel On Tue, Feb 23, 2010 at 10:14:55AM +0100, Zdenek Kabelac wrote: > Well update of my rawhide usually more then 3/4 hour - so 6 minutes running in > background - that's really nothing. Perhaps you don't care, but others do really care. > And quite frankly - during the update you need to update/recompile only > changed files - you could copy compiled & unchanged data to new file - thus in > fact it would takes couple seconds - unless each glibc update changes all i18n > locale definition, I doubt that - isn't that what the locale-archive.tmpl is > already doing? There is a big tree of inclusions between the locale files, so you'd need to checksum them together with all the dependencies. Anyway, that would still leave us with 6 minutes (on slow machines maybe half an hour) during Fedora installation. I don't understand your sudden holy war against generated data in the distro, after all the locales aren't definitely the largest (but is something everybody has installed). Look at stuff like kdelibs-apidocs or asterisk-apidoc which are 650+ resp. 320+ MB of generated data. And this wouldn't help you in any way for lvm, locale-archive would be the same size... Jakub ^ permalink raw reply [flat|nested] 22+ messages in thread
* lvm and locales memory issue 2010-02-23 9:45 ` Jakub Jelinek @ 2010-02-23 10:12 ` Zdenek Kabelac 0 siblings, 0 replies; 22+ messages in thread From: Zdenek Kabelac @ 2010-02-23 10:12 UTC (permalink / raw) To: lvm-devel On 23.2.2010 10:45, Jakub Jelinek wrote: > On Tue, Feb 23, 2010 at 10:14:55AM +0100, Zdenek Kabelac wrote: >> Well update of my rawhide usually more then 3/4 hour - so 6 minutes running in >> background - that's really nothing. > > Perhaps you don't care, but others do really care. > Well the key point here is that >90% of users in fact really need 1-3 locales on their system - thus for majority of user they could in fact spend 6 seconds in regenerating whole locale-archive file from scratch - and do not need to transfer bigger glib-common and spend a lot of time with handling 100MB files during updates. I simply believe that other distributions are more user friendly here. Of course I could be mistaken and there could be a big demand from Fedora users, to switch their locales to every single language, as nearly every Fedora user runs a lot of localization tests on his machine all the time daily... IMHO locale-archive data could be generated on the background during the whole lengthy upgrade process - to me this looks similar to ldconfig or library prelinking - thus the time is really not important - once the new locale-archive is created its switched with the old file - where exactly do you see the problem here? In fact glibc-common my split locale-archive file into a separate rpm - for those what want to safe the generation time and need all locales handy... Another note - I do care a lot about the speed - but also about the space. >> And quite frankly - during the update you need to update/recompile only >> changed files - you could copy compiled & unchanged data to new file - thus in >> fact it would takes couple seconds - unless each glibc update changes all i18n >> locale definition, I doubt that - isn't that what the locale-archive.tmpl is >> already doing? > > There is a big tree of inclusions between the locale files, so you'd need to > checksum them together with all the dependencies. Anyway, that would still > leave us with 6 minutes (on slow machines maybe half an hour) during Fedora > installation. I don't understand your sudden holy war against generated > data in the distro, after all the locales aren't definitely the largest (but Jakub please note - it's not a holly war against generated data in distro - this thread is about searching for solution with locking large mmaped files into a memory for no point for mlockall() applications, when we should work in limited 512MB xen/kvm guest. You seem to be defending the point, that glibc has the right to mmap/lock large pieces of memory into application memory space on the whatever benefit it might bring in as the glibc knows what's 'the best' for the user - and user application should have no control over this action - 'We' as lvm glibc user tend to believe this is a wrong way - we really cannot rewrite whole LVM just because 'feature of the week' in glibc leads to such and such memory allocation/waste - there should be some balance and control over this process. In fact we even miss some table with list of function that are supposed to be functional during mlockall(). As far as I'm aware we use only open/read/write/str_funtion_handling/malloc/free/printf_familly - at least for lvm case - we have some daemons which will need probably some inspection for mlockall() case. > is something everybody has installed). Look at stuff like kdelibs-apidocs > or asterisk-apidoc which are 650+ resp. 320+ MB of generated data. I don't have them installed and I could live without such piece of bloat easily. But there is 'no life' without glibc-common ;) Zdenek ^ permalink raw reply [flat|nested] 22+ messages in thread
* lvm and locales memory issue 2010-02-23 9:15 ` Jakub Jelinek 2010-02-23 9:14 ` Zdenek Kabelac @ 2010-02-23 13:07 ` Zdenek Kabelac 1 sibling, 0 replies; 22+ messages in thread From: Zdenek Kabelac @ 2010-02-23 13:07 UTC (permalink / raw) To: lvm-devel On 23.2.2010 10:15, Jakub Jelinek wrote: > On Tue, Feb 23, 2010 at 09:52:19AM +0100, Zdenek Kabelac wrote: >> Ok - and now I'm getting confused and lost here. >> >> >From our chat I've got impression that using 'localedef' is perfectly valid >> way how to create usable content for /usr/lib/locale. > > True, but very costly one. > > time for i in `cat /tmp/SUPPORTED`; do j=`echo $i | sed 's,/.*$,,'`; k=`echo $i | sed 's,^.*/,,'`; l=`echo $j | sed 's/\..*$//'`; localedef -A /usr/share/locale/locale.alias --no-archive -f $k -i $l -c /tmp/nyy/$j; done > > real 6m12.985s > user 5m34.818s > sys 0m33.134s > > Do you seriously suggest that we spend 6 minutes on very fast machines > during glibc-common upgrades? You must be joking. > As a side note to these timing informations - from my strace it looks like some major portion of this compilation time is spent in parsing translit_* files which seems to be the same for all locales?? - thus if there would be just tiny improvement in localedef, to be able to generate multiple locales at once, I'd assume the total compilation time could be much better.... So it really depends on which part do you want to optimize.... Zdenek ^ permalink raw reply [flat|nested] 22+ messages in thread
* lvm and locales memory issue 2010-02-23 8:52 ` Zdenek Kabelac 2010-02-23 9:15 ` Jakub Jelinek @ 2010-02-23 15:17 ` Zdenek Kabelac 2010-02-23 16:28 ` Jakub Jelinek 1 sibling, 1 reply; 22+ messages in thread From: Zdenek Kabelac @ 2010-02-23 15:17 UTC (permalink / raw) To: lvm-devel On 23.2.2010 09:52, Zdenek Kabelac wrote: > On 22.2.2010 19:23, Jakub Jelinek wrote: >> On Mon, Feb 22, 2010 at 06:11:50PM +0000, Alasdair G Kergon wrote: >>> On Mon, Feb 22, 2010 at 02:16:38PM +0100, Zdenek Kabelac wrote: >>>> On 22.2.2010 11:55, Zdenek Kabelac wrote: >>>>> 'rm -f /usr/lib/locale/locale-archive' >>>>> 'localedef -f UTF-8 -i cs_CZ /usr/lib/locale/cs_CZ.utf8' >>>>> 'localedef -i cs_CZ -c -f UTF-8 -A /usr/share/locale/locale.alias cs_CZ.UTF-8' >>> >>> %attr(0644,root,root) %verify(not md5 size mtime mode) %ghost %config(missingok,noreplace) %{_prefix}/lib/locale/locale-archive >>> >>> So removing/changing that file is a fully-supported process? >> >> Of course not. The reason it has these flags is for glibc upgrading >> purposes. glibc-common rpm ships with locale-archive.tmpl file, and %post >> merges all locales from that file with any possible user added locales in >> locale-archive into a new locale-archive, the *.tmpl file is then deleted. >> >>> Perhaps anaconda should automatically remove it (if it has not been customised) on >>> any system with < 640MB RAM? >> >> If you delete the file, you loose all localization, because we don't ship >> the individual /usr/lib/locale/*_*/* locale files for space reasons. >> The same effect as if you don't call setlocale at all, or just with "C" in >> all apps. >> > > > Ok - and now I'm getting confused and lost here. > >>From our chat I've got impression that using 'localedef' is perfectly valid > way how to create usable content for /usr/lib/locale. > > On my Fedora Rawhide system I've /usr/share/i18n/locales 6MB and > /usr/share/locale/cs 12MB, that contains amongst other things 128KB libc.mo > file and a lot of other files. > >>From my simple test program I do get valid Czech locale error messages and > properly localized strftime() output from glibc calls in the case I recreate > /usr/lib/local/locale-archive with 'localedef' command above. > > So what is the purpose of /usr/share/i18n/local, /usr/share/locale in this case? > > What do I miss in case the local-archive.tmpl file is not in used? > > Is the Czech locale special and there are some some other locales which could > not be easily recreated? > > (btw it takes 1.3sec to create 1 Czech locale-archive, thus it looks like for > 200 locales it could take maybe 4minutes in case of complete full recreate of > the locale-archive file) > > It seems to me that my glibc-commons contains all files needed to create > usable locale-archive even without locale-archive.tmpl - am I missing > something here? > >>From strace it looks like only the content of /usr/share/i18n/locales does > matter and it translates files in string form to binary form. > Files from /usr/share/locale are opened runtime when needed by application. > > Thus I'm quite curios why the file /usr/lib/locale/locale-archive is actually > opened for the case that only LC_MESSAGES is set to some locales. > IMHO for this only files form /usr/share/locale should matter - I could > assume it's because of the aliasing handling which is also hidden inside > cached binary files - but it's pretty overkill isn't? It looks like cs_CZ.utf8/LC_MESSAGES/SYS_LC_MESSAGES is just 59 bytes. There is something seriously wrong with the current glibc optimalization to have 100MB locked into memory if you want to use 59 bytes from this file.... Few more comments: local-archive for cs_CZ.utf8 is: ~475kb (with 100kb hole inside) however files in cs_CZ.utf8 have in total ~372kb when we add german de_DE.utf8 locale - the sum of local-archive basically follows the increase size in from cs_CZ.utf8 & de_DE.utf8 put together - no sharing of a single information. Looking at the size of /usr/share/i18n/locales/cs_CZ - one may start to wonder why Czech locales are defining collates for arabic latin and other 'related' laguages, while in German there is simple 'copy "iso14651_t1"' Another note could be - Ubuntu does not even use locale-archive file and uses locales on per file basis - so now I'm getting curious, where are the tests, that proves that Fedora gets some measurable performance advantage? (you would probably need 24000 page entries to create mmap table for whole 100MB file... and if only specific portions of this large file are mapped, than it's quite simple to switch these to malloc/read code when user set some flag... Zdenek ^ permalink raw reply [flat|nested] 22+ messages in thread
* lvm and locales memory issue 2010-02-23 15:17 ` Zdenek Kabelac @ 2010-02-23 16:28 ` Jakub Jelinek 2010-02-23 16:53 ` Alasdair G Kergon ` (2 more replies) 0 siblings, 3 replies; 22+ messages in thread From: Jakub Jelinek @ 2010-02-23 16:28 UTC (permalink / raw) To: lvm-devel On Tue, Feb 23, 2010 at 04:17:39PM +0100, Zdenek Kabelac wrote: > It looks like cs_CZ.utf8/LC_MESSAGES/SYS_LC_MESSAGES is just 59 bytes. > There is something seriously wrong with the current glibc optimalization to > have 100MB locked into memory if you want to use 59 bytes from this file.... LC_MESSAGES contains just yesstr/nostr definition, nothing else. But guess your application isn't asking just for LC_MESSAGES category... > Looking at the size of /usr/share/i18n/locales/cs_CZ - one may start to wonder > why Czech locales are defining collates for arabic latin and other 'related' > laguages, while in German there is simple 'copy "iso14651_t1"' iso14651_t1 defines collation for all kinds of charsets, ideally cs_CZ should just include that file too and tweak afterwards for the differences Czech ordering requires. > Another note could be - Ubuntu does not even use locale-archive file and uses > locales on per file basis - so now I'm getting curious, where are the tests, Not everything Ubuntu does is necessarily a good idea. > that proves that Fedora gets some measurable performance advantage? Try something trivial, like: #include <locale.h> int main (void) { int i; for (i = 0; i < 1000000; i++) switch (i % 5) { case 0: setlocale (LC_ALL, "C"); break; case 1: setlocale (LC_ALL, "en_US.UTF-8"); break; case 2: setlocale (LC_ALL, "cs_CZ.UTF-8"); break; case 3: setlocale (LC_ALL, "fr_FR.UTF-8"); break; case 4: setlocale (LC_ALL, "de_DE.UTF-8"); break; } return 0; } With locale-archive 1.362s, without, using locale files, 10.355s. And that is not even using a locale name that needs alias lookup, which would need parsing of locale.alias too. As this is something almost every program calls@least once, it is not a good idea to slow this down completely unnecessarily. Jakub ^ permalink raw reply [flat|nested] 22+ messages in thread
* lvm and locales memory issue 2010-02-23 16:28 ` Jakub Jelinek @ 2010-02-23 16:53 ` Alasdair G Kergon 2010-02-23 16:56 ` Zdenek Kabelac 2010-02-24 9:39 ` Zdenek Kabelac 2 siblings, 0 replies; 22+ messages in thread From: Alasdair G Kergon @ 2010-02-23 16:53 UTC (permalink / raw) To: lvm-devel On Tue, Feb 23, 2010 at 05:28:42PM +0100, Jakub Jelinek wrote: > for (i = 0; i < 1000000; i++) > switch (i % 5) > { > case 0: setlocale (LC_ALL, "C"); break; > case 1: setlocale (LC_ALL, "en_US.UTF-8"); break; > case 2: setlocale (LC_ALL, "cs_CZ.UTF-8"); break; > case 3: setlocale (LC_ALL, "fr_FR.UTF-8"); break; > case 4: setlocale (LC_ALL, "de_DE.UTF-8"); break; > } > return 0; > } > > As this is something almost > every program calls at least once, it is not a good idea to slow this down > completely unnecessarily. Indeed - but lvm only calls this once, not a million times, so I'm afraid it's the wrong optimisation for us. Locale-based functions are anyway not used in our performance-critical code paths. An 80-90% saving on the memory footprint is a much more important consideration for us. Alasdair ^ permalink raw reply [flat|nested] 22+ messages in thread
* lvm and locales memory issue 2010-02-23 16:28 ` Jakub Jelinek 2010-02-23 16:53 ` Alasdair G Kergon @ 2010-02-23 16:56 ` Zdenek Kabelac 2010-02-24 9:39 ` Zdenek Kabelac 2 siblings, 0 replies; 22+ messages in thread From: Zdenek Kabelac @ 2010-02-23 16:56 UTC (permalink / raw) To: lvm-devel On 23.2.2010 17:28, Jakub Jelinek wrote: > On Tue, Feb 23, 2010 at 04:17:39PM +0100, Zdenek Kabelac wrote: >> It looks like cs_CZ.utf8/LC_MESSAGES/SYS_LC_MESSAGES is just 59 bytes. >> There is something seriously wrong with the current glibc optimalization to >> have 100MB locked into memory if you want to use 59 bytes from this file.... > > LC_MESSAGES contains just yesstr/nostr definition, nothing else. > But guess your application isn't asking just for LC_MESSAGES category... But that's exactly what we actually need for our lvm. Translated error messages... >> Looking at the size of /usr/share/i18n/locales/cs_CZ - one may start to wonder >> why Czech locales are defining collates for arabic latin and other 'related' >> laguages, while in German there is simple 'copy "iso14651_t1"' > > iso14651_t1 defines collation for all kinds of charsets, ideally cs_CZ > should just include that file too and tweak afterwards for the differences > Czech ordering requires. And where is the problem to handle that this way ? >> Another note could be - Ubuntu does not even use locale-archive file and uses >> locales on per file basis - so now I'm getting curious, where are the tests, > > Not everything Ubuntu does is necessarily a good idea. > >> that proves that Fedora gets some measurable performance advantage? > > Try something trivial, like: > #include <locale.h> Yeah - typical real world application... Anyway all we want to achive here is - to have a choice - if application is using mlockall() it should be able to select less memory demanding way of handling locales - whole code is already there and imho needs just a little tweaking around. Thus glibc could still handle millions of setlocale switches per second for typical Fedora user which surely appreciate this worthy optimization for 100MB disk space - but if the application needs - it should be able to select a method - which allow only thousands of switches - but will fit into few hundreds kb of memory space for mlockall(). Zdenek ^ permalink raw reply [flat|nested] 22+ messages in thread
* lvm and locales memory issue 2010-02-23 16:28 ` Jakub Jelinek 2010-02-23 16:53 ` Alasdair G Kergon 2010-02-23 16:56 ` Zdenek Kabelac @ 2010-02-24 9:39 ` Zdenek Kabelac 2 siblings, 0 replies; 22+ messages in thread From: Zdenek Kabelac @ 2010-02-24 9:39 UTC (permalink / raw) To: lvm-devel On 23.2.2010 17:28, Jakub Jelinek wrote: >> Another note could be - Ubuntu does not even use locale-archive file and uses >> locales on per file basis - so now I'm getting curious, where are the tests, > > Not everything Ubuntu does is necessarily a good idea. > >> that proves that Fedora gets some measurable performance advantage? > > Try something trivial, like: > #include <locale.h> > > int > main (void) > { > int i; > for (i = 0; i < 1000000; i++) > switch (i % 5) > { > case 0: setlocale (LC_ALL, "C"); break; > case 1: setlocale (LC_ALL, "en_US.UTF-8"); break; > case 2: setlocale (LC_ALL, "cs_CZ.UTF-8"); break; > case 3: setlocale (LC_ALL, "fr_FR.UTF-8"); break; > case 4: setlocale (LC_ALL, "de_DE.UTF-8"); break; > } > return 0; > } > > With locale-archive 1.362s, without, using locale files, 10.355s. And >From looking into the code and keeping in mind we need to handle these millions of switches per second as a killer feature - there seems to be nice way - instead of doing one large mmap call - how about doing several smaller sized mmap just for regions needed by given locale. Once mmap-ed - it will stay in program's memory till its exit just like now... So - instead of one large 100MB mmap - we would end for this 'benchmark' with maybe 8-10 mmap calls per setlocale - so let's say 50 mmap calls for small memory regions. (or less - depends on how the locale-archive is organized - maybe everything except LC_CTYPE & LC_COLLATE could be in on mmaped area) As a bonus - for the most common use-case - it might eat less pgt entries (I assume its ~190kb for x86_64 and 100MB file) On the opposite site - user of all locales at once would waste memory by having some pages mmmaped multiple times.... Zdenek ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2010-02-24 9:39 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-02-18 9:19 lvm and locales memory issue Zdenek Kabelac 2010-02-18 12:48 ` Milan Broz 2010-02-18 13:26 ` Zdenek Kabelac 2010-02-19 16:11 ` Zdenek Kabelac 2010-02-19 16:30 ` Alasdair G Kergon 2010-02-22 10:55 ` Zdenek Kabelac 2010-02-22 13:16 ` Zdenek Kabelac 2010-02-22 18:11 ` Alasdair G Kergon 2010-02-22 18:23 ` Jakub Jelinek 2010-02-22 18:51 ` Alasdair G Kergon 2010-02-22 19:05 ` Jakub Jelinek 2010-02-23 8:52 ` Zdenek Kabelac 2010-02-23 9:15 ` Jakub Jelinek 2010-02-23 9:14 ` Zdenek Kabelac 2010-02-23 9:45 ` Jakub Jelinek 2010-02-23 10:12 ` Zdenek Kabelac 2010-02-23 13:07 ` Zdenek Kabelac 2010-02-23 15:17 ` Zdenek Kabelac 2010-02-23 16:28 ` Jakub Jelinek 2010-02-23 16:53 ` Alasdair G Kergon 2010-02-23 16:56 ` Zdenek Kabelac 2010-02-24 9:39 ` Zdenek Kabelac
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.