Friday, October 18, 2024

Getting a handle on Java Performance

Java’s strong appeal for embedded applications is sometimes offset by concerns about its speed and its memory requirements. However, there are techniques that you can use to boost Java performance and reduce memory needs, and of course the Java virtual machine you choose affects Java performance, too. You can make better-informed decisions about using Java by understanding the factors that affect its performance and selecting meaningful benchmarks for embedded applications.

Techniques for improving application execution and choosing the right Java virtual machine (JVM) address only a few aspects of system architecture that affect overall Java performance. When selecting an embedded Java platform, you must take into account a host of other factors, beyond the scope of this article, that have an impact on performance. Among them are hardware processor selection, Java compatibility and supported APIs, application reliability and scalability, the choice of a real-time operating system (RTOS) with associated native libraries and drivers, the availability of Java development tool kits and middleware, graphics support, and the ability to put the application code into ROM.

Once you’ve selected a hardware and software development platform, there are a variety of factors to consider that will help you choose the best-performing Java virtual machine (JVM) for your application.

Java Is Big and Slow: Myth or Reality?
Although the average Java bytecode application executes about ten times more slowly than the same program written in C or C++, how well an application is written in Java can have a tremendous impact on performance, as a study by Lutz Prechelt, “Comparing Java vs. C/C++ Efficiency Differences to Interpersonal Differences” (Communications of the ACM, October 1999), has shown. In the study, 38 programmers were asked to write the same application program in either C/C++ or Java. Applying statistical analysis to the performance data for the programs revealed that actual performance differences depended more on the way the programs were written than on the language used. Indeed, the study showed that a well-written Java program could equal or exceed the efficiency of an average-quality C/C++ program.

Various approaches are available for boosting bytecode execution speed. They include using a just-in-time (JIT) compiler, an ahead-of-time compiler, or a dynamic adaptive compiler; putting the Java application code into ROM (“ROMizing” it); rewriting the JVM’s bytecode interpretation loop in assembly language; and using a Java hardware accelerator.

Consider Compilers
JIT compilers, which compile bytecode on the fly during execution, generally aren’t suitable for embedded applications, though. They produce excellent performance improvements in desktop Java applications but typically require 16 to 32 MB of RAM in addition to the application’s requirements. The large memory requirement places JIT compilers out of reach for many categories of embedded applications.

Java Graphics DevelopmentAhead-of-time compilers rival JIT compilers in increasing Java execution speed. Unlike JIT compilers, they’re used before the application is loaded onto the target device, as their name indicates. That eliminates the need for extra RAM, but it creates the need for more ROM or flash memory (that is, storage static memory), because compiled machine code requires four to five times the memory of Java bytecode. Compiling ahead of time tends to undermine one of the great benefits of the Java platform because a measure of dynamic extensibility can be lost, since it may not be possible to download new versions of compiled classes. Additionally, any dynamically loaded code, like an applet, won’t benefit from ahead-of-time compilation and will execute more slowly than resident compiled code.

Profiling Java code, although somewhat complex, can help minimize code expansion when you’re using an ahead-of-time compiler. A good goal is to compile only that 20 percent of the Java classes in which the application spends 80 percent or more of its time.

Dynamic adaptive compilers offer a good compromise between JIT and ahead-of-time compilers (see Table 1). They’re similar to JIT compilers in that they translate bytecode into machine code on the fly. Dynamic adaptive compilers, however, perform statistical analysis on the application code to determine where the code merits compiling and where it’s better to let the JVM interpret the bytecode. The memory used by this type of compiler is user-configurable, so you can evaluate the trade-off between memory and speed and decide how much memory to allocate to the compiler.

Placing the bytecode into ROM can contribute to faster application performance. It doesn’t make the code run faster. It does, however, translate the code into a format that the JVM can execute directly from ROM, causing the code to load faster by eliminating class loading and code verification, tasks normally performed by the JVM.

Another way to speed up bytecode execution without using ahead-of-time or dynamic compilation techniques is to rewrite the bytecode interpreter in the JVM. Because the interpreter is a large C program, you can make it run faster by rewriting it in assembly language.

Java hardware accelerators, or Java chips, are the ultimate option for speeding up code execution. They’re emerging in two fundamental configurations. Chips of the first type, such as Chicory Systems’ HotShot and Nazomi Communications’ JSTAR, operate as Java coprocessors in conjunction with a general-purpose microprocessor, in much the same way that graphics accelerators are used. Java chips in the other category, like Patriot Scientific’s PSC1000 and aJile’s aJ-100, replace the general-purpose CPU.

Clearly, the latter are limited to applications that can be written entirely in Java. As for the first type, adding components of course raises costs, so this type offers a viable option only when the cost is acceptable. Indeed, the price of Java chips has been high because of relatively low production volumes. A high-volume solution, however, may be forthcoming in the form of the ARM940 general processor with an integrated Java accelerator, called Jazelle.

Memory Requirements
The Prechelt study determined that the average memory requirement of a program written in Java is two to three times greater than for one written in C/C++. Even the compact nature of bytecode, usually about 50 percent smaller than compiled C/C++ machine code, can’t offset that overhead. Recognizing that trying to drop Java in its original, desktop-oriented form into embedded systems won’t work, Sun Microsystems, Java’s originator, took the language through several evolutionary steps in an effort to tailor it to the embedded environment. Today, the Java 2 Platform, Micro Edition (J2ME), represents the latest, most evolved, and slimmest version of Java for the embedded space.

You can trim J2ME by eliminating classes and code components that aren’t needed for your application. The JVM, native libraries, core classes, and application bytecode go into ROM. JVMs for embedded applications generally run under 500 kB, whereas class libraries for J2ME typically don’t exceed 1.5 MB. Java components that affect RAM requirements include the JVM (for bytecode execution), the potential dynamic compiler, the Java heap, and the number of threads (the latter two obviously depend on the application). Executing as much of the application as possible using an interpreter—while maintaining acceptable execution performance—helps contain the memory footprint.

Selecting a highly scalable operating system and C run-time package allows you to tune these software components for optimal memory efficiency. Scaling the Java environment can be complex, however. Usually, a two-stage process is involved. First, you can use the command line verbose option, java -v, to see the classes an application uses and then manually extract the needed libraries and classes. If this process doesn’t save sufficient space, you can use filtering tools, like JavaFilter from Sun’s EmbeddedJava platform.

If you’re using Java, you should expect to increase memory and CPU resources compared with using C/C++ (see Table 2).

When using Java you should expect to increase memory and CPU resources compare with using C++

Choosing the Right Java Platform
Of course, your choice of JVM is one key to optimizing Java performance for your application. Obviously, you need a JVM designed for embedded applications.

Embedded JVMs are highly user-configurable to match different embedded system requirements, but which embedded JVM should you use? Java benchmarks are meant to help you evaluate JVMs and Java performance, but you need to be careful about which ones you use and about the conclusions you draw from them. A good benchmark score for a particular JVM doesn’t necessarily mean that using it will make your application go faster.

Consequently, before evaluating a JVM, you have to evaluate any benchmark to determine how meaningful it may be to your application, taking into account the whole Java environment that’s associated with it. Some benchmarks are very application-specific (a chat server benchmark like VolanoMark, for instance) and may not apply to the kind of Java applications you’re developing. Additionally, because JVM vendors commonly optimize their products to achieve good benchmark scores, the scores can be misleading about how much a given JVM will improve the performance of your particular application. Conversely, if your application has specific problems in certain areas, an environment that’s optimized to improve general processing won’t solve those specific processing problems.

Measuring Application Performance
When considering a benchmark to determine the overall performance of a Java application, bear in mind that bytecode execution, native code execution, and graphics each play a role. Their impact varies depending on the nature of the specific application: what the application does, how much of it is bytecode versus native code, and how much use it makes of graphics. How well a JVM will perform for a given application depends on how the unique mix of these three functional areas maps onto its capabilities. Given these variables, the best way to benchmark a JVM is against your own application. Since that’s not possible before the application has been written, you must find those benchmarks that are most relevant to the application you intend to write.

Sorting through Java benchmarks to find the ones that are relevant for embedded applications can be confusing. SpecJVM98, for example, provides a relatively complete set of benchmarks that test diverse aspects of the JVM. Sounds good—but Spec-JVM-98 runs in a client/server environment and requires a minimum of 48 MB of RAM on the client side for the JVM. That excludes it from any relevance to most embedded applications. In addition, it can’t be used with precompiled classes.

Other benchmarks have different pitfalls. VolanoMark, for example, is a chat server implementation and is therefore relevant only for benchmarking applications with the same set of requirements as chat servers. The JMark benchmark assumes that the application includes the applet viewer and a full implementation of Java’s Abstract Windowing Toolkit (AWT). This benchmark can be irrelevant for the many embedded applications that have no graphics or have limited graphics that don’t require full AWT support, such as devices running a PersonalJava minimal-AWT implementation.

Embedded CaffeineMark (ECM), the embedded version of the CaffeineMark benchmark from Pendragon Software (it has no graphics tests), is easy to run on any embedded JVM, since it requires support for basic Java core classes only, and it doesn’t require a large amount of memory. More importantly, there’s a high correlation between good scores on this benchmark and improved bytecode performance in embedded applications.

To get the most meaningful results from ECM, you must use exactly the same hardware when testing different JVMs. You must also pay attention to implementation differences among the JVMs you’re testing. If, for example, you’re comparing a JVM with a JIT compiler against a JVM without one, it’s important to run the JVM that has the JIT with the java -nojit option on the command line to ensure an apples-to-apples comparison.

ECM will typically make any JVM using compilation look good, no matter the type of compilation, because it includes a very small set of classes and always repeats the same small set of instructions. Dynamic compilers just cache the complete translation of the Java code in RAM and execute next iterations of the tests in native code. Ahead-of-time compilers can easily optimize the loops and algorithms used in ECM, too.

Although the industry abounds with other Java benchmarks, like Java Grande, SciMark, jBYTEmark, Dhrystone benchmark in Java, and UCSD Benchmarks for Java, there is no “ultimate” benchmark that can give you certainty about Java and JVM performance in embedded applications. The best strategy is to identify a suite of benchmarks that seem most relevant to your application and use the combined results of those benchmarks to help predict Java performance in a particular system environment.

Furthermore, the existing benchmarks may not measure other aspects of your application code. Tuning Java applications to meet performance goals may require addressing many program functions besides bytecode execution. Some of those functions—for example, thread management, synchronization, method-to-method calls, class resolution, object allocation and heap management (including garbage collection), calls to native methods, bytecode verification, and exception handling—occur within the JVM. Because few if any benchmarks address such functions, it falls to you to conduct an in-depth study of a JVM’s internals to understand how its design may affect crucial aspects of your application. Writing special programs that exercise critical aspects of a JVM can help you evaluate it for the application. If, for example, your application uses a heavy mix of Java and C code, you can benefit by writing a program that tests native method call performance. Other functions, including native code execution and such factors as network latency, may occur outside the JVM.

Graphics Performance
What if your application includes graphics? To start, there are two major factors that affect graphics performance in Java applications: Does the application’s graphics display driver use graphics coprocessor hardware acceleration? Is the application configured with a lightweight (faster) or a heavyweight (slower) implementation of the Abstract Windowing Toolkit? (See the figure.) In addition, like any other high-level Java service, graphics performance is affected by the way that the graphics services integrate with lower-level native libraries.

Wind River’s Personal JWorks includes a good benchmark for evaluating graphics performance in embedded systems. The benchmark targets the PersonalJava AWT with a set of 39 tests of images, buttons, scrolling, text, and basic 2-D graphics.

Real-World Performance
Finally, you need to consider the performance of your CPU. To help you identify CPU-bound performance, you should supplement simple benchmarks by running real-world applications that exercise large amounts of different, complex Java code. Such test code must meet a number of requirements: It should contain a large number of classes that reflect an estimate of the real application (20-plus is a good ballpark). It must also be large (thousands of lines, at least) and have no file system access and no graphics. Some existing programs meet all those criteria. The GNU regular expression package, regexp, for example, comprises about 3,000 lines of code and more than 21 classes, providing a large number of expressions to parse and match. Another program, the Bean Shell interpreter, is a simple prime number sieve that has 70 classes and several thousand lines of code. JavaCodeCompact, Sun’s PersonalJava ROMizing tool, also would make a good test program.

The result of running these programs as test cases illustrates the wide variance in the meaning of benchmark scores. For example, a JVM using a JIT compiler may run Embedded CaffeineMark up to 30 times faster than when the nojit option is turned on (thus running in pure interpretation mode), but the same JVM runs the Bean Shell and regexp tests only about one and a half times faster when using the JIT compiler. (The apparently impressive thirtyfold speedup on a simple benchmark like Embedded CaffeineMark is achieved through caching techniques that the compiler uses on the small amount of code and classes in ECM.) The difference in results clearly demonstrates that high benchmark scores may not translate into a commensurate level of performance improvement in real-world applications.

Actually, SpecJVM98 and JMark yield results that most closely approximate those for real-world applications. They do suffer, though, from the limitations discussed above. In particular, the usefulness of the former in the embedded space depends greatly on your ability to overcome the problems associated with your test infrastructure requirements.

Vincent Perrier is Product Marketing Manager, Java Platforms, for the Wind River Platforms Business Unit in Alameda, CA. He can be found at http://www.windriver.com

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles