Sunday, October 6, 2024

Compile Anything; Yes, even XML, but Should You?

Recent developments of programming languages, such as Java and XSLT, have extended the way programmers and users look at programs. Programs written in older languages such as Fortran, COBOL, C or C++ have a simple model for developing a program. The program is written, then translated by a tool called the compiler into machine instructions for a particular computer and then executed. The execution of the application performs actual instructions on the computer. Here I am ignoring the repetitive aspects of partial program development and debugging.

Languages such as Java and XSLT have a different execution model. Each is partially digested. Java is digested into bytecodes. XSLT is translated from its base XML formatted into the underlying language. Each of these digested forms is then executed by another program called an interpreter to get the results. Thus many computer technologists think that Java and XSLT are inefficient because they are interpreted languages.

Or are they? This is the model which the user needs to understand, however this paper points out that under-the-covers the two programming models can be made the same with comparable efficiencies. This has been understood for Java for several years. The original Java Virtual Machine was an interpreter and slow. Almost immediately developers began to develop Just-In-Time compilers that compiled the fragments of the Java program just before they were used. More sophisticated techniques have recently been developed such as the Sun HOTSPOT technology.

The same idea applies to XSLT. Early users of XSLT to transform XML data thought that XSLT must be implemented as an interpreter. We at DataPower, as well as others, have developed techniques to translate XSLT into optimized machine instructions, speeding the transformation of XML data enormously.

The idea behind these improvements, and behind the whole idea of compilation, is call partial evaluation. You have done this yourself. When you first started to program you probably wrote code that multiplies a matrix A by a vector v. The code would look something like (to avoid language wars I am using an artificial programming language):

        do i = 1 to 10
	    x[i] = 0;
            do j = 1 to 10
                x[i] = x[i] + A[i,j]*v[j]
            enddo
        enddo

Then you noticed that particular matrix has a special form. The matrix is upper triangular that means that all the elements below the diagonal are zero. Now this code is inefficient, it does a lot of multiplications by zero. So you rewrote it, taking into account that the first few elements in each row are zero.

        do i = 1 to 10
	    x[i] = 0;
            do j = i to 10
                x[i] = x[i] + A[i,j]*v[j]
            enddo
        enddo

You may have noticed even more structure. The matrix might be diagonal in which case the inner loop is replaced by a single execution of the body. Or that the matrix is tridiagonal so that there are only three non-zero terms and the loop can be replaced by three copies of the inner loop body. What you have performed is partial evaluation by hand.

The same idea applies to the compilers and interpreters mentioned above. Here the data is the Java or XSLT program and the program is the interpreter. The object is to transform the program (i.e. the interpreter) to create a more efficient execution. The classical compilers for Fortran, COBOL, C and C++ are just an extreme form of partial evaluation. The interpreter is completely transformed into a sequence of computer instructions and the source of the data (i.e. the program) is no longer needed.

    (Program)XSLT interpreter---|
                                | Partial Evaluation
                                |-------------------->Executing Program
                                |
    (Data)User XSLT Program-----|

Briefly, interpreters are usually structured as large loops. Each iteration of the loop fetches the next command from the XSLT program and jumps to the piece of code that executes it. After the code for that command is executed the interpreter jumps back to the loop to execute the next command.

Partial Evaluation can be performed by replacing each command in the XSLT program by the instructions from the interpreter that executed that command. References to data would need to be adjusted to insure that all data is referenced properly. This would eliminate the execution of the loop in the interpreter. This gives a speed improvement since the execution of the interpreter loop is one of the most expensive parts of the interpreter.

Now that the loop is eliminated one can do more. For a start, normal compiler optimization techniques can be applied removing repeated computations and recognizing expressions that are constants. Going further, optimizations specific to XSLT and XPath expressions can be performed to gain huge speed improvements.

One of the biggest improvements comes from the ability to precompute information. Now that the interpreter has been replaced by executing code the operations that the interpreter repeated performs can be computed ahead of time and data structures can be built to hold the information. With Java this is done for performing virtual method calls. For XSLT this is done for determining the template to be performed. The result is a program that is much more efficient than expected from the user’s view of the programming system.

The major point here is that partial evaluation can always be applied. With languages like XSLT one gets far. This is true in many situations in programming. The ultimate question to ask is: does partial evaluation pay off? It is a tradeoff. The costs of partial evaluation are the cost of compilation and the added memory required to store the results. The cost of partial evaluation (i.e. compiling) pays off mightily if the same XSLT program is used repeatedly and the results of the compilation are stored in a safe place for repeated use.

Excerpted from “Building an Optimizing Compiler,” by Bob Morgan; Digital Press, 1998

Bob Morgan is a Senior Engineer at DataPower Technology, a company delivering network devices for optimized XML processing and XML Web Service security. Bob is a graduate of MIT and performed PhD work at MIT in Functional Analysis. He is regarded speaker at both the University and industry level and the well-known author of the book “Building an Optimizing Compiler” published by Digital Press. He can be reached at morgan@datapower.com

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles