pre_parse and strcmp

All of lore.kernel.org
 help / color / mirror / Atom feed

* pre_parse and strcmp
@ 2014-10-02 21:09 Konrad Rzeszutek Wilk
  2014-10-02 21:15 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 6+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-10-02 21:09 UTC (permalink / raw)
  To: xen-devel, jbeulich

Hey Jan,

I've been digging a bit in the Xen code to figure out how to use the xen.cfg
file as it seemed not to work at all for me.

I finally narrowed it down to the fact that the file, while looking OK:

[root@localhost fedora]# more XEN.CFG 

[global]
default=linux

[linux]
options="console=vga,com1 com1=115200,8n1,pci,0 loglvl=all noreboot guest_loglvl=all iommu=verbose,no-intremap,debug"

kernel=vmlinuz-3.17.0-rc7+ root=/dev/mapper/fedora-root console=hvc0

ramdisk=initramfs-3.17.0-rc7+.img

Was typed up by me on the EFI shell instead of written from Linux.

Which means:
[root@localhost fedora]# file *.cfg
grub.cfg: ASCII text
xen.cfg:  Little-endian UTF-16 Unicode text, with CRLF, CR line terminators

Ends up looking at so (this is right before pre_parse, with a function that prints
each character and then if it sees an control it prints _):

[[_g_l_o_b_a_l_]_____d_e_f_a_u_l_t_=_l_i_n_u_x_________[_l_i_n_u_x_]_____o_p_t_i_o_n_s_=_"_c_o_n_s_o_l_e_=_v_g_a_,_c_o_m_1_ _c_o_m_1_=_1_1_5_2_0_0_,_8_n_1_,_p_c_i_,_0_ _l_o_g_l_v_l_=_a_l_l_ _n_o_r_e_b_o_o_t_ _g_u_e_s_t___l_o_g_l_v_l_=_a_l_l_ _i_o_m_m_u_=_v_e_r_b_o_s_e_,_n_o_-_i_n_t_r_e_m_a_p_,_d_e_b_u_g_"_____k_e_r_n_e_l_=_v_m_l_i_n_u_z_-_3_._1_7_._0_-_r_c_7_+_ _r_o_o_t_=_/_d_e_v_/_m_a_p_p_e_r_/_f_e_d_o_r_a_-_r_o_o_t_ _c_o_n_s_o_l_e_=_h_v_c_0_____r_a_m_d_i_s_k_=_i_n_i_t_r_a_m_f_s_-_3_._1_7_._0_-_r_c_7_+_._i_m_g_____________

Which pre_parse deals with - it replaces all of those pesky control ones to \0.

That means the 'get_value' is asked to find the 'default' in global, and
while it finds '[' it uses 'strncmp' to see if the 'default' is there.
The value that is there is 'd_e_f_a_u_l_t' (_ replaces the \0
character) which of course does not match with 'default'.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: pre_parse and strcmp
  2014-10-02 21:09 pre_parse and strcmp Konrad Rzeszutek Wilk
@ 2014-10-02 21:15 ` Konrad Rzeszutek Wilk
  2014-10-03 16:17   ` Jan Beulich
  0 siblings, 1 reply; 6+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-10-02 21:15 UTC (permalink / raw)
  To: xen-devel, jbeulich

On Thu, Oct 02, 2014 at 05:09:39PM -0400, Konrad Rzeszutek Wilk wrote:
> Hey Jan,
> 
> I've been digging a bit in the Xen code to figure out how to use the xen.cfg
> file as it seemed not to work at all for me.
> 
> I finally narrowed it down to the fact that the file, while looking OK:
> 
> [root@localhost fedora]# more XEN.CFG 
> 
> [global]
> default=linux
> 
> [linux]
> options="console=vga,com1 com1=115200,8n1,pci,0 loglvl=all noreboot guest_loglvl=all iommu=verbose,no-intremap,debug"
> 
> kernel=vmlinuz-3.17.0-rc7+ root=/dev/mapper/fedora-root console=hvc0
> 
> ramdisk=initramfs-3.17.0-rc7+.img
> 
> Was typed up by me on the EFI shell instead of written from Linux.
> 
> Which means:
> [root@localhost fedora]# file *.cfg
> grub.cfg: ASCII text
> xen.cfg:  Little-endian UTF-16 Unicode text, with CRLF, CR line terminators
> 
> Ends up looking at so (this is right before pre_parse, with a function that prints
> each character and then if it sees an control it prints _):
> 
> 
> [[_g_l_o_b_a_l_]_____d_e_f_a_u_l_t_=_l_i_n_u_x_________[_l_i_n_u_x_]_____o_p_t_i_o_n_s_=_"_c_o_n_s_o_l_e_=_v_g_a_,_c_o_m_1_ _c_o_m_1_=_1_1_5_2_0_0_,_8_n_1_,_p_c_i_,_0_ _l_o_g_l_v_l_=_a_l_l_ _n_o_r_e_b_o_o_t_ _g_u_e_s_t___l_o_g_l_v_l_=_a_l_l_ _i_o_m_m_u_=_v_e_r_b_o_s_e_,_n_o_-_i_n_t_r_e_m_a_p_,_d_e_b_u_g_"_____k_e_r_n_e_l_=_v_m_l_i_n_u_z_-_3_._1_7_._0_-_r_c_7_+_ _r_o_o_t_=_/_d_e_v_/_m_a_p_p_e_r_/_f_e_d_o_r_a_-_r_o_o_t_ _c_o_n_s_o_l_e_=_h_v_c_0_____r_a_m_d_i_s_k_=_i_n_i_t_r_a_m_f_s_-_3_._1_7_._0_-_r_c_7_+_._i_m_g_____________
> 
> 
> Which pre_parse deals with - it replaces all of those pesky control ones to \0.
> 
> That means the 'get_value' is asked to find the 'default' in global, and
> while it finds '[' it uses 'strncmp' to see if the 'default' is there.
> The value that is there is 'd_e_f_a_u_l_t' (_ replaces the \0
> character) which of course does not match with 'default'.

Hit sent to fast [was going to include a patch in this email
once I had completed this]

My thinking is that the best solution is to have a similar to 'pre_parse'
function that would convert the in memory buffer from UTF-16 to a normal
ascii type one.

And hook it up in pre-parse to fix this up.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: pre_parse and strcmp
  2014-10-02 21:15 ` Konrad Rzeszutek Wilk
@ 2014-10-03 16:17   ` Jan Beulich
  2014-10-03 16:22     ` konrad wilk
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Beulich @ 2014-10-03 16:17 UTC (permalink / raw)
  To: konrad.wilk; +Cc: xen-devel

>>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> 10/02/14 11:15 PM >>>
>Hit sent to fast [was going to include a patch in this email
>once I had completed this]
>
>My thinking is that the best solution is to have a similar to 'pre_parse'
>function that would convert the in memory buffer from UTF-16 to a normal
>ascii type one.
>
>And hook it up in pre-parse to fix this up.

I certainly don't mind a patch to deal with UTF-16 config files so (in fact already
when I originally coded it I considered this would be a good future enhancement).
I'm not, however, convinced that simply converting back to ASCII is the proper
solution here. Instead, if we want to allow UTF-16 config files, we should do the
conversion the other way around.

And then of course there is the problem of detection: The example you gave didn't
make clear whether the file was properly starting with a BOM, yet if it doesn't
telling ASCII from UTF-16 is guesswork.

Jan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: pre_parse and strcmp
  2014-10-03 16:17   ` Jan Beulich
@ 2014-10-03 16:22     ` konrad wilk
  2014-10-04 14:57       ` Don Slutz
  0 siblings, 1 reply; 6+ messages in thread
From: konrad wilk @ 2014-10-03 16:22 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On 10/3/2014 12:17 PM, Jan Beulich wrote:
>>>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> 10/02/14 11:15 PM >>>
>> Hit sent to fast [was going to include a patch in this email
>> once I had completed this]
>>
>> My thinking is that the best solution is to have a similar to 'pre_parse'
>> function that would convert the in memory buffer from UTF-16 to a normal
>> ascii type one.
>>
>> And hook it up in pre-parse to fix this up.
>
> I certainly don't mind a patch to deal with UTF-16 config files so (in fact already
> when I originally coded it I considered this would be a good future enhancement).
> I'm not, however, convinced that simply converting back to ASCII is the proper
> solution here. Instead, if we want to allow UTF-16 config files, we should do the
> conversion the other way around.

OK. That will take some time to cobble up.
>
> And then of course there is the problem of detection: The example you gave didn't
> make clear whether the file was properly starting with a BOM, yet if it doesn't
> telling ASCII from UTF-16 is guesswork.

Ah, so that is what the odd character at the start was (BOM)! Yes, the 
file is very much that type.
>
> Jan
>
>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: pre_parse and strcmp
  2014-10-03 16:22     ` konrad wilk
@ 2014-10-04 14:57       ` Don Slutz
  2014-10-06  8:11         ` Jan Beulich
  0 siblings, 1 reply; 6+ messages in thread
From: Don Slutz @ 2014-10-04 14:57 UTC (permalink / raw)
  To: konrad wilk; +Cc: xen-devel, Jan Beulich

On 10/03/14 12:22, konrad wilk wrote:
> On 10/3/2014 12:17 PM, Jan Beulich wrote:
>>>>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> 10/02/14 11:15 PM >>>
>>> Hit sent to fast [was going to include a patch in this email
>>> once I had completed this]
>>>
>>> My thinking is that the best solution is to have a similar to 
>>> 'pre_parse'
>>> function that would convert the in memory buffer from UTF-16 to a 
>>> normal
>>> ascii type one.
>>>
>>> And hook it up in pre-parse to fix this up.
>>
>> I certainly don't mind a patch to deal with UTF-16 config files so 
>> (in fact already
>> when I originally coded it I considered this would be a good future 
>> enhancement).
>> I'm not, however, convinced that simply converting back to ASCII is 
>> the proper
>> solution here. Instead, if we want to allow UTF-16 config files, we 
>> should do the
>> conversion the other way around.
>
> OK. That will take some time to cobble up.

A simpler change might be to UTF-8.  (my guess would be that it would 
then look
like ASCII and so strcmp would continue to "work".

    -Don Slutz

>>
>> And then of course there is the problem of detection: The example you 
>> gave didn't
>> make clear whether the file was properly starting with a BOM, yet if 
>> it doesn't
>> telling ASCII from UTF-16 is guesswork.
>
> Ah, so that is what the odd character at the start was (BOM)! Yes, the 
> file is very much that type.
>>
>> Jan
>>
>>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: pre_parse and strcmp
  2014-10-04 14:57       ` Don Slutz
@ 2014-10-06  8:11         ` Jan Beulich
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Beulich @ 2014-10-06  8:11 UTC (permalink / raw)
  To: konrad wilk, Don Slutz; +Cc: xen-devel

>>> On 04.10.14 at 16:57, <dslutz@verizon.com> wrote:
> On 10/03/14 12:22, konrad wilk wrote:
>> On 10/3/2014 12:17 PM, Jan Beulich wrote:
>>>>>> Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> 10/02/14 11:15 PM >>>
>>>> Hit sent to fast [was going to include a patch in this email
>>>> once I had completed this]
>>>>
>>>> My thinking is that the best solution is to have a similar to 
>>>> 'pre_parse'
>>>> function that would convert the in memory buffer from UTF-16 to a 
>>>> normal
>>>> ascii type one.
>>>>
>>>> And hook it up in pre-parse to fix this up.
>>>
>>> I certainly don't mind a patch to deal with UTF-16 config files so 
>>> (in fact already
>>> when I originally coded it I considered this would be a good future 
>>> enhancement).
>>> I'm not, however, convinced that simply converting back to ASCII is 
>>> the proper
>>> solution here. Instead, if we want to allow UTF-16 config files, we 
>>> should do the
>>> conversion the other way around.
>>
>> OK. That will take some time to cobble up.
> 
> A simpler change might be to UTF-8.  (my guess would be that it would 
> then look
> like ASCII and so strcmp would continue to "work".

Since you can't pass UTF-8 strings to EFI services, we'd still need
another translation subsequently, yet obviously we should try to
avoid doing more translation rounds than necessary. Furthermore
that would then raise the question of whether to treat the config
file itself as UTF-8 instead of ASCII (other than UTF-16 ones, UTF-8
ones frequently don't come with a BOM at their start, and hence
can't be told apart without assumptions/prior agreement).

Jan

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-10-06  8:11 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-02 21:09 pre_parse and strcmp Konrad Rzeszutek Wilk
2014-10-02 21:15 ` Konrad Rzeszutek Wilk
2014-10-03 16:17   ` Jan Beulich
2014-10-03 16:22     ` konrad wilk
2014-10-04 14:57       ` Don Slutz
2014-10-06  8:11         ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.