Discussion:
[isabelle] Problems with newline in String.literal
Thiemann, René
2018-09-05 08:32:45 UTC
Permalink
Dear all,

I’m currently changing the error messages in our formalization from the “string” type to
“String.literal”, so that the exported code uses target language
strings in the generated code. Unfortunately, there seems to be a mistake
in the handling of newline:

definition "message = STR ''this is⏎a two line text''"

is translated into Haskell as

message :: String;
message = "this is\\a two line text";

where the \\ should be a \n

Moreover, also the following behavior is weird:

lemma "STR ''⏎'' = STR 0x5C" "integer_of_char (CHR ''⏎'') = 10"
by code_simp+

Here, one observes that the ''⏎'' gets different characters codes,
depending on whether it is put into a CHR or into a STR.
Note that 0x5C is precisely the backslash as in the genera
Dominique Unruh
2018-09-05 09:14:56 UTC
Permalink
As a workaround, you can use an actual newline:

definition "message = STR ''Line
Line''"
thm message_def
export_code message in Haskell

This fragment also shows that the parser and printer are out of sync here.
"thm message_def" prints the newline as ⏎ but if we use ⏎ in the
definition, we get '\\'.

Best wishes,
Dominique
Post by Thiemann, René
Dear all,
I’m currently changing the error messages in our formalization from the
“string” type to
“String.literal”, so that the exported code uses target language
strings in the generated code. Unfortunately, there seems to be a mistake
definition "message = STR ''this is⏎a two line text''"
is translated into Haskell as
message :: String;
message = "this is\\a two line text";
where the \\ should be a \n
lemma "STR ''⏎'' = STR 0x5C" "integer_of_char (CHR ''⏎'') = 10"
by code_simp+
Here, one observes that the ''⏎'' gets different characters codes,
depending on whether it is put into a CHR or into a STR.
Note that 0x5C is precisely the backslash as in the generated Haskell code.
Cheers,
René
Thiemann, René
2018-09-05 12:17:47 UTC
Permalink
Dear Dominique,

thanks for the hint, but I don’t want replace 709 \<newline> by newlines in my sources.
This will cause trouble in the indentation of the source and will be hard to undo later on.

Therefore, at the moment I just replace all \\ by \n in the generated source using "sed”.

Cheers,
René
Post by Dominique Unruh
definition "message = STR ''Line
Line''"
thm message_def
export_code message in Haskell
This fragment also shows that the parser and printer are out of sync here. "thm message_def" prints the newline as ⏎ but if we use ⏎ in the definition, we get '\\'.
Best wishes,
Dominique
Dear all,
I’m currently changing the error messages in our formalization from the “string” type to
“String.literal”, so that the exported code uses target language
strings in the generated code. Unfortunately, there seems to be a mistake
definition "message = STR ''this is⏎a two line text''"
is translated into Haskell as
message :: String;
message = "this is\\a two line text";
where the \\ should be a \n
lemma "STR ''⏎'' = STR 0x5C" "integer_of_char (CHR ''⏎'') = 10"
by code_simp+
Here, one observes that the ''⏎'' gets different characters codes,
depending on whether it is put into a CHR or into a STR.
Note that 0x5C is precisely the backslash as in the generated Haskell code.
Florian Haftmann
2018-09-10 16:48:25 UTC
Permalink
Hi Rene,

thanks for reporting this.

The reason is that by accident Isabelle2018 accepts funny literals like
»STR "ðÀµ"« which consist of a list of xsymbols »\<
>« of which only the
first character »\« is interpreted.

Future releases will follow the same convention as »CHR "
"« and hence
only allow proper ascii characters with the symbolic newline as special
case.

This is a mere syntax issue; the logical foundation is sane.

Cheers,
Florian
Post by Thiemann, René
Dear all,
I’m currently changing the error messages in our formalization from the “string” type to
“String.literal”, so that the exported code uses target language
strings in the generated code. Unfortunately, there seems to be a mistake
definition "message = STR ''this is⏎a two line text''"
is translated into Haskell as
message :: String;
message = "this is\\a two line text";
where the \\ should be a \n
lemma "STR ''⏎'' = STR 0x5C" "integer_of_char (CHR ''⏎'') = 10"
by code_simp+
Here, one observes that the ''⏎'' gets different characters codes,
depending on whether it is put into a CHR or into a STR.
Note that 0x5C is precisely the backslash as in the generated Haskell code.
Cheers,
René
--
PGP available:
http://isabelle.in.tum.de/~haftmann/pgp/florian_haftmann_at_informatik_tu_muenchen_de
Loading...