isabelle


Parsing the content of a cartouche using a “term parser”


I want to implement a domain specific language (with its own parser) inside cartouches in Isabelle. For example, I would like the term (MY ‹123›, 3) to invoke my own parser for the substring 123, but to parse the rest normally as terms.
Following HOL/ex/Cartouche_Examples.thy, I understand how to install my own parse translation for subterms of the form MY ‹...›, and how to get the content of the cartouche as either string*Position.T or as Symbol_Pos.T list.
I also understand how to use Isabelle's parser combinators to write a parser of type term parser.
But I cannot find out how to apply the parser to a string (or a Symbol_Pos.T list).
In other words, what I am still lacking is a function
fun parse_cartouche ctx (cartouche:string) (pos:Position.T) : term = ???
that applies my parser of type term parser to the string cartouche (and correctly reports parse errors to the top level).
To clarify:
I want to make use of the existing infrastructure of Isabelle for tracking/reporting parsing locations. For example, if there is a parse error, I expect the code to be red in Isabelle/jEdit, and if inside my own language, I would call a parser like Args.parse_term, I would expect Isabelle/jEdit to color variable correct, and to get type information by control-hover.
I prefer not to reimplement my own parsers for common things like int's etc., but can do so if I get at least the previous bullet point. (However, parsing a substring of my language as a term, I would have to some existing parsing function, since I cannot reimplement the Isabelle syntax on my own.
Below is my complete code so far (with a dummy implementation of parse_cartouche).
theory Scratch
imports Main
begin
ML {*
(* In reality, this would of course be a much more complex parser. *)
val my_parser : term parser = Parse.nat >> HOLogic.mk_nat
(* This function should invoke my_parser to parse the content of cartouche.
Parse errors should be properly reported (i.e., with red highlighting in
jEdit etc. *)
fun parse_cartouche ctx (cartouche:string) (pos:Position.T) : term =
(warning ("I should parse: " ^ cartouche ^ ". Returning arbitrary term instead"); #{term True})
(* Modified from Cartouche_Examples.thy *)
fun cartouche_tr (ctx:Proof.context) args =
let fun err () = raise TERM ("cartouche_tr", args) in
(case args of
[(c as Const (#{syntax_const "_constrain"}, _)) $ Free (s, _) $ p] =>
(case Term_Position.decode_position p of
SOME (pos, _) => c $ (parse_cartouche ctx s pos) $ p
| NONE => err ())
| _ => err ())
end;
*}
syntax "_my_syntax" :: "cartouche_position ⇒ 'a" ("MY_")
parse_translation ‹[(#{syntax_const "_my_syntax"}, cartouche_tr)]›
term "(MY ‹123›, 3)" (* Should parse as (123,3) *)
end
Because this is a relatively rare use case, I'm not sure if a "canonical" solution for this has emerged yet. But I can at least give you two examples from my own code which should help illustrate the general approach.
Evaluation of ML code in terms
source
The following parse translation, given a function eval_term : string -> term, extracts some ML source from a cartouche, evaluates it to a term, which is then used as the result of the parse translation.
fun term_translation ctxt args =
let
fun err () = raise TERM ("Splice.term_translation", args)
fun input s pos =
let
val content = Symbol_Pos.cartouche_content (Symbol_Pos.explode (s, pos))
val (text, range) = Symbol_Pos.implode_range (Symbol_Pos.range content) content
in
Input.source true text range
end
in
case args of
[(c as Const (#{syntax_const "_constrain"}, _)) $ Free (s, _) $ p] =>
(case Term_Position.decode_position p of
SOME (pos, _) => c $ eval_term (input s pos) ctxt $ p
| NONE => err ())
| _ => err ()
end
Embedding XML
source
This one allows me to embed XML literals into terms which will then be interpreted as terms.
syntax "_cartouche_xml" :: "cartouche_position \<Rightarrow> 'a" ("XML _")
parse_translation\<open>
let
fun translation args =
let
fun err () = raise TERM ("Common._cartouche_xml", args)
fun input s pos = Symbol_Pos.implode (Symbol_Pos.cartouche_content (Symbol_Pos.explode (s, pos)))
val eval = Codec.the_decode Codec.term o XML.parse
in
case args of
[(c as Const (#{syntax_const "_constrain"}, _)) $ Free (s, _) $ p] =>
(case Term_Position.decode_position p of
SOME (pos, _) => c $ eval (input s pos) $ p
| NONE => err ())
| _ => err ()
end
in
[(#{syntax_const "_cartouche_xml"}, K translation)]
end
\<close>
Update
The following code should allow you to turn an Input.source into something digestible for the parser combinators, including full position information:
ML ‹
val input = ‹term"3 + 4"›;
(* a bit more complicated than just Input.pos_of because otherwise the position includes the
outer cartouche brackets, which manifests as an off-by-one-error in the markup *)
val pos = Input.source_explode input |> Symbol_Pos.range |> Position.range_position;
val str = Input.source_content input;
val toks = Token.explode Keyword.empty_keywords pos str;
val parser = Args.$$$ "term" |-- Args.embedded_inner_syntax;
parser toks |> fst |> Syntax.read_term #{context}
›
Based on #larsrh's answer and own experimentation, I came up with the following answer.
The parse translation gets the cartouche content as a string, together with a position. There can be converted into a Symbol_Pos.T list using Symbol_Pos.cartouche_content o Symbol_Pos.explode. (This is covered in the examples in Cartouche_Examples.thy, contributed with Isabelle.)
The Symbol_Pos.T list can be converted into Source.source containing Symbol_Pos.Ts using Source.of_list.
The Source.source containing containing Symbol_Pos.Ts can be converted into a Source.source containing containing Token.Ts using Token.source'.
We remove whitespace tokens from this source using Token.source_proper.
And the result is converted to a Token.T list using Source.exhaust.
Finally, parsers of type 'a parser can be applied to such a Token.T list. (Or, if we have an 'a context_parser, then we need to additionally supply a context.)
Some additional work needs to be done: add an EOF to the Token.T list to allow parsers to detect the end of the input. Handle errors in the parser (to get nice error messages).
The code below is a complete commented working example (for Isabelle 2016-1), the source can also be found here.
theory Scratch
imports Main
begin
ML {*
(* test_parser is just a definition of a silly example parser. It parses text of the form "123 * ‹x+y›"
where 123 is an arbitrary natural, and x+y a term. test_parser is of type term context_parser.
The parser returns a term that is a list 123 copies of x+y.
If you have constructed a "term parser" instead, you can either convert it using Scan.lift,
or modify the definition of parse_cartouche below slightly.
*)
fun sym_parser sym = Parse.sym_ident :-- (fn s => if s=sym then Scan.succeed () else Scan.fail) >> #1;
val test_parser = Scan.lift Parse.nat --| Scan.lift (sym_parser "*" || Parse.reserved "x") -- Args.term
>> (fn (n,t) => replicate n t |> HOLogic.mk_list dummyT)
(* parse_cartouche: This function takes the cartouche that should be parsed (as a plain string
without markup), together with its position. (All this information can be extracted using the
information available to a parse translation, see cartouch_tr below.) *)
fun parse_cartouche ctx (cartouche:string) (pos:Position.T) : term =
let
(* This extracts the content of the cartouche as a "Symbol_Pos.T list".
One posibility to continue from here would be to write a parser that works
on "Symbol_Pos.T list". However, most of the predefined parsers expect
"Token.T list" (a single token may consist of several symbols, e.g., 123 is one token). *)
val content = Symbol_Pos.cartouche_content (Symbol_Pos.explode (cartouche, pos))
(* Translate content into a "Token.T list". *)
val toks = content |> Source.of_list (* Create a "Source.source" containing the symbols *)
|> Token.source' true Keyword.empty_keywords (* Translate into a "Source.source" containing tokens.
I don't know what the argument true does here. false also works, I think. *)
|> Token.source_proper (* Remove things like whitespaces *)
|> Source.exhaust (* Translate the source into a list of tokens *)
|> (fn src => src # [Token.eof]) (* Add an eof to the end of the token list, to enable Parse.eof below *)
(* A conversion function that produces error messages. The ignored argument here
contains the context and the list of remaining tokens, if needed for constructing
the message. *)
fun errmsg (_,SOME msg) = msg
| errmsg (_,NONE) = fn _ => "Syntax error"
(* Apply the parser "test_parser". We additionally combine it with Parse.eof to ensure that
the parser parses the whole text (till EOF). And we use Scan.!! to convert parsing failures
into parsing errors, and Scan.error to report parsing errors to the toplevel. *)
val (term,_) = Scan.error (Scan.!! errmsg (test_parser --| Scan.lift Parse.eof)) (Context.Proof ctx,toks)
(* If test_parser was of type "term parser" instead of "term context_parser", we would use instead:
val (term,_) = Scan.error (Scan.!! errmsg (test_parser --| Parse.eof)) toks *)
in term end
(* A parse translation that translates cartouches using test_parser. The code is very close to
the examples from Cartouche_Examples.thy. It takes a given cartouche-subterm, gets its
position, and calls parse_cartouche to do the translation to a term. *)
fun cartouche_tr (ctx:Proof.context) args =
let fun err () = raise TERM ("cartouche_tr", args) in
(case args of
[(c as Const (#{syntax_const "_constrain"}, _)) $ Free (s, _) $ p] =>
(case Term_Position.decode_position p of
SOME (pos, _) => c $ (parse_cartouche ctx s pos) $ p
| NONE => err ())
| _ => err ())
end;
*}
(* Define a syntax for calling our translation. In this case, the syntax is "MY ‹to-be-parsed›" *)
syntax "_my_syntax" :: "cartouche_position ⇒ 'a" ("MY_")
(* Binds our parse translation to that syntax. *)
parse_translation ‹[(#{syntax_const "_my_syntax"}, cartouche_tr)]›
term "(MY ‹3 * ‹b+c››, 2)" (* Should parse as ([b+c,b+c,b+c],2) *)
term "(MY ‹10 x ‹q››, 2)" (* Should parse as ([q, q, q, q, q, q, q, q, q, q], 2) *)
term "(MY ‹3 * ‹MY ‹3 * ‹b+c››››, 2)" (* Things can be nested! *)
end

Related Links

Can I overload the notation for operators that are assigned to bool and list?
Type hierarchy definition in Isabelle
How to define abstract types in agda
How to define Subtypes in Isabelle and what they mean?
type_synonym vs consts in Isabelle definition
Organizing constraints in isabelle in order to model a system
Trouble with Int Theory in Isabelle/HOL
How do I do simple multithreading in Isabelle ML?
Isabelle: Proof on difference between 2 lists
Printing out / showing detailed steps of proof methods (like simp) in a proof in isabelle
Defining disjoint union of different types in Isabelle and more
Case names for locale interpretation
“invalid map function” when defining a corecursive tree
Trying to generalize a bit vector that uses typedef, bool list, and nat length
Factoring out a lemma premise as a definition causes failure in proof method (auto) application in isabelle
How do I convert “thm conjI” to an ASCII string I can save to a file?

Categories

HOME
hook
debugging
omnet++
smarty
view
raspberry-pi
electron
framework7
yahoo-oauth
spring-jdbc
amortized-analysis
rdf
gitpitch
gnupg
nano-server
gz
try-catch
http-status-code-504
database-replication
java-3d
scriptcs
opencover
functional-testing
dcevm
excel-2007
crosstab
crystal-reports-2010
emgucv
tdd
virtualdub
pingfederate
ejabberd-module
frame
.net-4.0
perlin-noise
framemaker
azure-sql-database
google-api-nodejs-client
picasso
qwerty
multilingual
neuroscience
android-browser
starteam
code-search-engine
ssjs
impersonation
unobtrusive-validation
core-plot
total-commander
flashair
s
slick-3.0
convertapi
flickr-api
xml-documentation
network-flow
hibernate-tools
hendrix
return-value
fancybox-2
rdfs
slickedit
kbuild
dtexec
windows-mobile-6.5
pillow
mcafee
parallel-data-warehouse
spring-android
gridpane
mikroc
pcf
skobbler-maps
impresspages
bstr
lua-5.1
revolution-r
asp.net5
itextpdf
netmq
pervasive-sql
lemon
atk4
spidermonkey
apache-commons-net
clicktag
ng-animate
jscript.net
codeigniter-url
relocation
socketexception
java-metro-framework
neolane
openexr
valuechangelistener
comexception
centos5
mysqltuner
aqtime
quartz-graphics
ora-00911
kyotocabinet
simba
custom-backend
gcj
mod-auth
f#-powerpack
pyinotify
imac
radcombobox
fixed-width
kdbg
adrotator
privilege

Resources

Database Users
RDBMS discuss
Database Dev&Adm
javascript
java
csharp
php
android
javascript
java
csharp
php
python
android
jquery
ruby
ios
html
Mobile App
Mobile App
Mobile App