Harman Patil (Editor)

Quark Framework

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
First appeared
  
2004

Paradigm
  
functional, non-strict, modular

Designed by
  
Luke Evans, Bo Ilic (Business Objects)

Typing discipline
  
static, strong, inferred

The Quark Framework (Open Quark) consists of a non-strict functional language and runtime for the Java platform. The framework allows the compilation and evaluation of functional logic on the Java Virtual Machine (JVM), directly or under the control of a regular Java application. The native language (syntax) for the Quark Framework is called CAL. A full range of Java APIs provides the means to invoke the CAL compiler, manipulate workspaces of modules, add metadata and evaluate functional logic. Functional logic is usually dynamically compiled to Java bytecodes and executed directly on the JVM (with some small kernel logic controlling the normal evaluation order). However, an interpreter (G-machine) is also available.

Contents

Motivation, History and Concept

CAL, and the associated tools and APIs forming the Quark Framework, was conceived in 1998 within Seagate Software (later Crystal Decisions, now Business Objects) as a way to enable a common framework for sharing business logic across business intelligence products. A framework was desired that would allow data behaviour to be captured, at all levels of abstraction, and for these components to be composable into the actual data flows that individual products required. In general terms, such a system had to provide a way to express data behaviour declaratively, with universal composition and typing laws. In essence, this places the problem firmly in the domain of functional languages, and the desire to allow machine compositions of functions without incurring increasing efficiency penalties strongly suggested a non-strict evaluation semantic.

As well as the operational requirements, it was envisaged that future application logic would likely be written for a dynamic platform (such as Java or .Net), and therefore it was determined that the Quark Framework should be native to Java (initially) with considerable emphasis on performance and interoperability with application logic written on that platform. In 1999, work began in Crystal's Research Group on an implementation of this framework. Many of the original insights into lazy functional systems were drawn from implementations of Haskell. Early on, Haskell (HUGS, GHC) was even considered as a starting point for the implementation itself, but a number of requirements made this impractical, so it was decided to let the project emerge and evolve freely following its own design criteria. For the first few years of development, the CAL source language itself was not a primary motivator, but the operational semantics were of primary concern. At this time, CAL was merely a convenient script for expressing functions rather than composing them programmatically through Java APIs, or using a graphical language native to a tool called the Gem Cutter, which began to be implemented in mid-2000 as a way to author systems of functions that could be used in applications. From about 2002 onward, the CAL language became rather more central to the Quark Framework, especially once programmers began to create usable libraries of functions for real applications. As the language evolved, so did the demand for tools, and so a range of tools and utilities were created in parallel to language development to support those doing real work with the platform.

While the Gem Cutter remained the main development environment in the initial years, since late 2005 there has been an intention to produce Eclipse-based tools, and the emphasis has shifted to activities advancing the state of Eclipse integration.

The motivations for the Quark Framework appear to be similar to those driving Microsoft's LINQ project, in particular the desire for a declarative style and some lazy evaluation for certain kinds of logic, hosted within applications coded in an Object Oriented language. While CAL cannot yet be embedded inline within Java source, generated functions are fully compiled and the system can efficiently share data between CAL and Java sourced logic. For instance, CAL lists can be marshalled dynamically to and from Java data structures that implement the Iterator interface.

In 2007, the Quark Framework is an advanced and well tested framework for integrating non-strict functional logic into Java programs. It can be used as a standalone functional language too, that happens to compile to Java bytecodes. The framework was offered as open source under a BSD-style license in January 2007, and continues to be used and developed within Business Objects.

CAL

CAL is a programming language originally developed by Business Objects and now released as "Open Quark", with sources, under a BSD-style license. It is a lazy functional programming language similar to the Haskell programming language. An implementation is available from the Business Objects Labs site. CAL forms part of the Quark Framework which includes a visual programming tool, Gem Cutter.

One of the main objectives of the Quark Framework is to allow logic expressed in a declarative, lazyfunctional style to be easily and efficiently integrated into Java applications. CAL source is typically compiled directly to byte codes (though an interpreter is also available), and can be called from regular OO code. In turn CAL code can call any Java code. Evaluation of CAL programs, and exploration of results can be completely controlled by procedural code, allowing data transformation logic (for which CAL is ideally suited) to be flexibly integrated into Java applications. A Java program can also easily build new functions on-the-fly, to describe transient data flows, or to create persisted logic. This form of 'functional metaprogramming' is common in real-world deployments of the Quark Framework.

The CAL language borrows much from Haskell syntax, but also eschews some Haskell features. As such, CAL is a strongly typed, lazily evaluated functional language, supporting algebraic functions and data types with parametric polymorphism and type inferencing. CAL has special syntax for strings, tuples, characters, numbers, lists and records. Single parameter type classes are supported, with superclasses, derived instances, deriving clauses for common classes, default class methods and higher-kinded type variables. While doubtless a subjective measure, CAL's developers have tried to keep the language simple. In particular, only the expression style is supported (Haskell's equation based style with argument pattern matching is not supported), and CAL does not make use of layout (semicolons are required to terminate definitions). CAL also makes certain syntactic choices to align it more strongly with Java. For instance, Java's syntax for comments is used, and CAL's inline documentation comments are close to JavaDoc.

One of the main differences between Haskell and CAL is in the area of interfacing with the 'real world'. Whereas Haskell goes to great lengths to validate the purity of functions, CAL relies on the programmer to hide 'imported impurity', exposing pure functions from a module where impure imports are made. CAL has a range of mechanisms for controlling evaluation order and laziness. These are often essential tools in the creation of effective solutions with native functions, but are also important in the aforementioned interface with the stateful world. The choice to de-emphasise formal tracking of purity, in favour of mechanisms to allow the programmer to express the right logic, has proven to provide a good balance of flexibility and 'directness' when interfacing with external operations.

One of the main design goals for CAL was to make the language as comfortable as possible for mainstream developers to pick up and use effectively. This is reflected in choices for syntax, but also in conventions and patterns used within the standard libraries. For example, libraries use longer, descriptive names and are commented to explain the implementations and best practices for use.

To see some CAL language source code click here. This tutorial CAL module is designed as a top-to-bottom 'feature parade' to showcase basic syntax with examples of some built-in and user defined data structures.

Here are a few examples, derived from the tutorial module linked to in the preceding paragraph:

Quicksort

Note that CAL supports inline documentation comments, with embedded tags. Function type declarations immediately precede the function definition. CAL supports type classes (e.g. Ord). Two local functions are declared in the let block. Quicksort has a recursive definition, building up the output list at each level of recursion from the sorts applied to the list of values either side of the pivot item.

Data Declarations

In common with other functional languages, CAL provides a way to declare new types using data declarations. Here is an example of declaring a new data type for 'Employee':

Note that fields (data constructor arguments) must have names in CAL, as well as their type. The deriving clause makes the data type work with functions defined in certain type classes. Here we ensure that CAL automatically adds the requisite logic to make values of Employee be renderable as strings for the purposes of tracing.

Fields can be extracted in a variety of ways:

a) Positionally in a case expression:

b) By field name in a case expression:

c) By a selector expression

Note that CAL allows multiple constructor arguments to be cited in a case extractor, along with multiple constructors to match on (so long as they all have the named arguments). So, the following scenario is possible (assuming the data declaration includes the employeeID field and new constructors for Contractor and Associate):

Records

CAL unifies tuples and records, which can be used as containers for heterogeneously typed values (as compared to lists, which are sequences of values of the same type). Records are extensible and can be convenient for passing collections of values where the formality of a new data definition is not necessary. Records can have textual or numeric (ordinal) indexed fields. Traditional tuples are simply records with exclusively ordinal fields and tuple constructors (parentheses) simply generate ordinal fields in sequence up from #1.

Here are three examples of records: a tuple (demonstrating its simpler constructor syntax), a record with ordinal fields (fully equivalent to the first tuple) and a record with mixed ordinal and named fields:

CAL on Java

The CAL compiler takes CAL source, as text or as a source model (Java object model). This is processed by the early compiler stages to desugar and analyse the source. The intermediate form, plus metadata from analysis, is processed by a number of optimisers, optionally including a full rewrite optimiser capable of function fusion, deforestation and other major optimisations that preserve semantics but improve a program operationally.

The compiler supports plugable back-ends. First amongst these is LECC (Lazily Evaluating CAL Compiler). This back-end generates Java classes and byte codes directly, emitting methods according to compiler schemes that take account of context metadata derived in the compiler, such as strictness of function arguments. LECC can package generated code in a number of ways, including as a CAL Archive (CAR), or a Java JAR. At runtime, a class loader can load an entire corpus of functions, or the Quark Framework loader can load closely connected functions according to prior dependency analysis. This latter feature is important to minimise start up times, whereby only the functions actually required by an application incur loading overhead.

The LECC back-end can also generate Java source code, which is then compiled by the regular JDK Java compiler to produce class files. Amongst other things, this is very useful when validating compiler schemes during compiler development, and provides a way to reason about the operational behaviour of CAL on the Java platform.

As well as LECC, Open Quark includes a G-machine interpreter and a compiler back-end that generates G-machine code. While considerably slower than LECC, this option is useful for experiments and may be a better fit for some deployments.

While many deployments of CAL may use the language standalone, the Quark Framework is fundamentally designed to be used within regular Java applications to provide a hybrid/multi-paradigm system. The intent is to allow transformational logic, which benefits from more algebraic representation, to be embeddable within Java OO logic handling regular (stateful) aspects of the application. To this end, CAL supports a very powerful and easy to use interface to Java, and the Quark Framework SDK allows Java code considerable control over how functions are evaluated and results produced. Java code can issue new functions to the Quark Framework for compilation, which can be immediately available for evaluation. Thus, Java can use the Quark Framework as a functional meta-programming environment. This has been a common use case for the framework, and is supported efficiently (low latency, with concurrent compilation and evaluation). On the consumption side, results can be presented to Java as lazy suspensions, so that minimal functional evaluation is performed and only when Java logic requests the 'next' output value. This feature allows data-flow logic to be constructed on-the-fly within a Java application, used on-demand and then disposed of if necessary. Both the LECC and G-machine runtimes are able to load and unload functions from memory to support a fully dynamic environment.

The 'foreign' interface between CAL and Java is able to import any Java entity into CAL and make calls on Java methods. Values are passed efficiently between the two environments, without unnecessary boxing/unboxing operations. A powerful feature called "I/O policies" allows values to be lazily marshalled between structures if required (for instance, if you have a particular Java class or data structure that you wish to produce to represent a CAL value). These policies are declared completely on the CAL side, leaving the Java side 'natural'. The default policies are usually quite sufficient to share values, so usually nothing special must be done to exchange values.

Here are some examples of CAL code that declares interfaces to Java entities:

Importing a Java type, a constructor and a method

The following code imports the "java.util.LinkedList" class (which becomes the CAL 'opaque' data type "JLinkedList"). The fragment then imports the default constructor for this class, and the instance method 'add'. All of these imports are marked private, which means that they would only be usable within the importing CAL module. This is quite common, as it is usually good practise to export public functions from modules that behave as pure functions.

Fields can be imported too

This is especially useful for constants, per the example.

Casts for Java types

Occasionally, it is necessary to cast between Java types that have been imported. For instance, JObject and JList are imported in the Prelude. If you needed to cast between them in CAL, then you could declare the appropriate casting functions. These declarations are properly checked for validity.

Of course, with casting, there's always the potential for runtime class cast exceptions, so it's a good thing that CAL's exception features are fully integrated with Java too, e.g. ...

Java Exceptions

The following fragment shows a CAL exception type being declared and a function that can demonstrate the throwing in CAL of this exception, or a Java NullPointerException depending on the argument it is passed. Note that CAL exceptions can have any payload. The definition of nullPointerException_make is not shown for brevity.

If you want, you can make any type an Exception. See the following fragment:

Finally, and just for fun, we show how you can map an exception handler over a list!

The three arithmetic exceptions (divide by zero) are converted to the default value (-999) by the exception handler.

References

Quark Framework Wikipedia