text‹ The code generator is actually a framework consisting of different components which can be customised individually. Conceptually all components operate on Isabelle's logic framework Pure. Practically, the object logic HOL provides the necessary facilities to make use of the code generator, mainly since it is an extension of Isabelle/Pure. The constellation of the different components is visualized in the following picture. \begin{figure}[h] \def\sys#1{\emph{#1}} \begin{tikzpicture}[x = 4cm, y = 1cm] \tikzstyle positive=[color = black, fill = white]; \tikzstyle negative=[color = white, fill = black]; \tikzstyle entity=[rounded corners, draw, thick]; \tikzstyle process=[ellipse, draw, thick]; \tikzstyle arrow=[-stealth, semithick]; \node (spec) at (0, 3) [entity, positive] {specification tools}; \node (user) at (1, 3) [entity, positive] {user proofs}; \node (spec_user_join) at (0.5, 3) [shape=coordinate] {}; \node (raw) at (0.5, 4) [entity, positive] {raw code equations}; \node (pre) at (1.5, 4) [process, positive] {preprocessing}; \node (eqn) at (2.5, 4) [entity, positive] {code equations}; \node (iml) at (0.5, 0) [entity, positive] {intermediate program}; \node (seri) at (1.5, 0) [process, positive] {serialisation}; \node (SML) at (2.5, 3) [entity, positive] {\sys{SML}}; \node (OCaml) at (2.5, 2) [entity, positive] {\sys{OCaml}}; \node (Haskell) at (2.5, 1) [entity, positive] {\sys{Haskell}}; \node (Scala) at (2.5, 0) [entity, positive] {\sys{Scala}}; \draw [semithick] (spec) -- (spec_user_join); \draw [semithick] (user) -- (spec_user_join); \draw [-diamond, semithick] (spec_user_join) -- (raw); \draw [arrow] (raw) -- (pre); \draw [arrow] (pre) -- (eqn); \draw [arrow] (eqn) -- node (transl) [process, positive] {translation} (iml); \draw [arrow] (iml) -- (seri); \draw [arrow] (seri) -- (SML); \draw [arrow] (seri) -- (OCaml); \draw [arrow] (seri) -- (Haskell); \draw [arrow] (seri) -- (Scala); \end{tikzpicture} \caption{Code generator architecture} \label{fig:arch} \end{figure} \noindent Central to code generation is the notion of \emph{code equations}. A code equation as a first approximation is a theorem of the form ‹f t🪙1 t🪙2 … t🪙n ≡ t›(an equation headed by a constant ‹f›with arguments ‹t🪙1 t🪙2 … t🪙n› and right hand side ‹t›). \begin{itemize} \item Starting point of code generation is a collection of (raw) code equations in a theory. It is not relevant where they stem from, but typically they were either produced by specification tools or proved explicitly by the user. \item These raw code equations can be subjected to theorem transformations. This \qn{preprocessor} (see \secref{sec:preproc}) can apply the full expressiveness of ML-based theorem transformations to code generation. The result of preprocessing is a structured collection of code equations. \item These code equations are \qn{translated} to a program in an abstract intermediate language. Think of it as a kind of \qt{Mini-Haskell} with four \qn{statements}: ‹data›(for datatypes), ‹fun›(stemming from code equations), also ‹class›and ‹inst› (for type classes). \item Finally, the abstract program is \qn{serialised} into concrete source code of a target language. This step only produces concrete syntax but does not change the program in essence; all conceptual transformations occur in the translation step. \end{itemize} \noindent From these steps, only the last two are carried out outside the logic; by keeping this layer as thin as possible, the amount of code to trust is kept to a minimum. ›
subsection‹The pre- and postprocessor \label{sec:preproc}›
text‹ Before selected function theorems are turned into abstract code, a chain of definitional transformation steps is carried out: \emph{preprocessing}. The preprocessor consists of two components: a \emph{simpset} and \emph{function transformers}. The preprocessor simpset has a disparate brother, the \emph{postprocessor simpset}. In the theory-to-code scenario depicted in the picture above, it plays no role. But if generated code is used to evaluate expressions (cf.~\secref{sec:evaluation}), the postprocessor simpset is applied to the resulting expression before this is turned back. The pre- and postprocessor \emph{simpsets} can apply the full generality of the Isabelle simplifier. Due to the interpretation of theorems as code equations, rewrites are applied to the right hand side and the arguments of the left hand side of an equation, but never to the constant heading the left hand side. Pre- and postprocessor can be setup to transfer between expressions suitable for logical reasoning and expressions suitable for execution. As example, take list membership; logically it is expressed as 🍋‹x ∈ set xs›. But for execution the intermediate set is not desirable. Hence the following specification: ›
definition %quote member :: "'a list ==> 'a ==> bool" where
[code_abbrev]: "member xs x ⟷ x ∈ set xs"
text‹ \noindent The \emph{@{attribute code_abbrev} attribute} declares its theorem a rewrite rule for the postprocessor and the symmetric of its theorem as rewrite rule for the preprocessor. Together, this has the effect that expressions @{thm (rhs) member_def [no_vars]} are replaced by @{thm (lhs) member_def [no_vars]} in generated code, but are turned back into @{thm (rhs) member_def [no_vars]} if generated code is used for evaluation. Rewrite rules for pre- or postprocessor may be declared independently using \emph{@{attribute code_unfold}} or \emph{@{attribute code_post}} respectively. \emph{Function transformers} provide a very general interface, transforming a list of function theorems to another list of function theorems, provided that neither the heading constant nor its type change. The 🍋‹0::nat›/ 🍋‹Suc› pattern used in theory ‹Code_Abstract_Nat›(see \secref{abstract_nat}) uses this interface. \noindent The current setup of the pre- and postprocessor may be inspected using the @{command_def print_codeproc} command. @{command_def code_thms} (see \secref{sec:equations}) provides a convenient mechanism to inspect the impact of a preprocessor setup on code equations. Attribute @{attribute code_preproc_trace} allows for low-level tracing: ›
text‹ As told in \secref{sec:principle}, the notion of code equations is vital to code generation. Indeed most problems which occur in practice can be resolved by an inspection of the underlying code equations. It is possible to exchange the default code equations for constants by explicitly proving alternative ones: ›
text‹ \noindent The annotation ‹[code]›is an ‹attribute› which states that the given theorems should be considered as code equations for a ‹fun›statement -- the corresponding constant is determined syntactically. The resulting code: ›
text %quote ‹ @{code_stmts dequeue constant: dequeue (Haskell)} ›
text‹ \noindent You may note that the equality test 🍋‹xs = []›has been replaced by the predicate 🍋‹List.null xs›. This is due to the default setup of the \qn{preprocessor}. This possibility to select arbitrary code equations is the key technique for program and datatype refinement (see \secref{sec:refinement}). Due to the preprocessor, there is the distinction of raw code equations (before preprocessing) and code equations (after preprocessing). The first can be listed (among other data) using the @{command_def print_codesetup} command. The code equations after preprocessing are already are blueprint of the generated program and can be inspected using the @{command code_thms} command: ›
code_thms %quote dequeue
text‹ \noindent This prints a table with the code equations for 🍋‹dequeue›, including \emph{all} code equations those equations depend on recursively. These dependencies themselves can be visualized using the @{command_def code_deps} command. ›
subsection‹Equality›
text‹ Implementation of equality deserves some attention. Here an example function involving polymorphic equality: ›
primrec %quote collect_duplicates :: "'a list ==> 'a list ==> 'a list ==> 'a list"where "collect_duplicates xs ys [] = xs"
| "collect_duplicates xs ys (z#zs) = (if z ∈ set xs then if z ∈ set ys then collect_duplicates xs ys zs else collect_duplicates xs (z#ys) zs else collect_duplicates (z#xs) (z#ys) zs)"
text‹ \noindent During preprocessing, the membership test is rewritten, resulting in 🍋‹List.member›, which itself performs an explicit equality check, as can be seen in the corresponding ‹SML›code: ›
text %quote ‹ @{code_stmts collect_duplicates (SML)} ›
text‹ \noindent Obviously, polymorphic equality is implemented the Haskell way using a type class. How is this achieved? HOL introduces an explicit class 🍋‹equal›with a corresponding operation 🍋‹HOL.equal› such that @{thm equal [no_vars]}. The preprocessing framework does the rest by propagating the 🍋‹equal›constraints through all dependent code equations. For datatypes, instances of 🍋‹equal›are implicitly derived when possible. For other types, you may instantiate ‹equal›manually like any other type class. ›
text‹ Explicit partiality is caused by missing patterns (in contrast to partiality caused by nontermination, which is covered in Section~\ref{sec:partial}). Here is an example, again for amortised queues: ›
definition %quote strict_dequeue :: "'a queue ==> 'a × 'a queue"where "strict_dequeue q = (case dequeue q of (Some x, q') ==> (x, q'))"
text‹ Observe that on the right hand side of the definition of 🍋‹strict_dequeue'›,the unspecified constant 🍋‹empty_queue› occurs. An attempt to generate code for 🍋‹strict_dequeue'›would make the code generator complain that 🍋‹empty_queue›has no associated code equations. In most situations unimplemented constants indeed indicated a broken program; however such constants can also be thought of as function definitions which always fail, since there is never a successful pattern match on the left hand side. In order to categorise a constant into that category explicitly, use the @{attribute code} attribute with ‹abort›: ›
declare %quote [[code abort: empty_queue]]
text‹ \noindent Then the code generator will just insert an error or exception at the appropriate position: ›
text %quote ‹ @{code_stmts strict_dequeue' constant: empty_queue strict_dequeue' (Haskell)} ›
text‹ \noindent This feature however is rarely needed in practice. Note that the HOL default setup already includes ›
declare %quote [[code abort: undefined]]
text‹ \noindent -- hence 🍋‹undefined›can always be used in such situations. ›
text‹ Under certain circumstances, the code generator fails to produce code entirely. To debug these, the following hints may prove helpful: \begin{description} \ditem{\emph{Check with a different target language}.} Sometimes the situation gets more clear if you switch to another target language; the code generated there might give some hints what prevents the code generator to produce code for the desired language. \ditem{\emph{Inspect code equations}.} Code equations are the central carrier of code generation. Most problems occurring while generating code can be traced to single equations which are printed as part of the error message. A closer inspection of those may offer the key for solving issues (cf.~\secref{sec:equations}). \ditem{\emph{Inspect preprocessor setup}.} The preprocessor might transform code equations unexpectedly; to understand an inspection of its setup is necessary (cf.~\secref{sec:preproc}). \ditem{\emph{Generate exceptions}.} If the code generator complains about missing code equations, in can be helpful to implement the offending constants as exceptions (cf.~\secref{sec:partiality}); this allows at least for a formal generation of code, whose inspection may then give clues what is wrong. \ditem{\emph{Remove offending code equations}.} If code generation is prevented by just a single equation, this can be removed (cf.~\secref{sec:equations}) to allow formal code generation, whose result in turn can be used to trace the problem. The most prominent case here are mismatches in type class signatures (\qt{wellsortedness error}). \end{description} ›
end
Messung V0.5 in Prozent
¤ Dauer der Verarbeitung: 0.15 Sekunden
(vorverarbeitet am 2026-05-01)
¤
Die Informationen auf dieser Webseite wurden
nach bestem Wissen sorgfältig zusammengestellt. Es wird jedoch weder Vollständigkeit, noch Richtigkeit,
noch Qualität der bereit gestellten Informationen zugesichert.
Bemerkung:
Die farbliche Syntaxdarstellung und die Messung sind noch experimentell.