Friday, September 20, 2024

Continuous Performance Testing with JUnitPerf

Performance tuning software is like a wicked game of Twister. You end up using arms and legs to keep the performance knobs in perfect harmony. And just when you think you’ve got it all under control, tweaking code or changing the runtime environment can send performance into a death spiral. In this article we’ll learn how to tune with more confidence and less stress by writing automated performance tests that tell no lies.

Grist for the tuning mill

Imagine we’ve accepted a mission to develop yet another online product catalog using a stateless session bean facade. But before we dive into the implementation of our product catalog service, let’s start by writing a test. Why? Well, how else will we know what code to write? If we write a test first, we’ll have an example of the catalog service’s intended use. In addition to demonstrating the intent of the catalog service, the test can automatically validate that the catalog service returns the correct results. Once the test passes, we’re done!

Listing 1 shows the JUnit test, which queries for all products in the snowboard category of the product catalog. (Note: a full tutorial on JUnit goes beyond the scope of this article. The JUnit Primer will help get you up and running quickly).

Listing 1: Unit testing the product catalog service

public class CatalogTest extends TestCase {
	
    public CatalogTest(String name) {
        super(name);
    }
	
    public void testGetProducts() throws Exception {

        String snowboardCategory = "Snowboard";

        Catalog catalog = (Catalog)getCatalogHome().create();
		
        Collection products = 
            catalog.getProductsByCategory(snowboardCategory);
		
        assertEquals(25, products.size());

        Iterator productIter = products.iterator();
        while (productIter.hasNext()) {
            ProductDetails product = (ProductDetails) productIter.next();
            assertEquals(snowboardCategory, product.getCategory());
        }
        
        catalog.remove();
    }
}

Using the remote Catalog interface, the test queries for all products in the snowboard category. We expect exactly 25 products in this category because before running the test we created 25 example products in the snowboard category of the product database. Iterating over the resulting collection of products, the test validates that the catalog service only returns the products in the snowboard category.

To get the test to pass, we’ll have to write the code for the Catalog EJB, of course. For the sake of this article, just treat the Catalog interface as a black box. That is, take the perspective of the unit test. Notice that all business logic occurs on the server side, behind the facade of the Catalog interface. As such, our design details are encapsulated. If performance becomes an issue, we can tune the code behind the Catalog interface without adversely affecting its clients.

Specifying response time as a measure of success

With much pride, we demonstrate the new catalog service to our customer, who is delighted. After kicking the tires a bit, the customer’s consensus is that the service just isn’t fast enough; we need to improve performance.

The average response time of the web page to query and list 25 products is approximately 1.4 seconds. We think we can do better than that with a little tuning, but we need a way to measure success. So we ask our customer to write a new performance requirement. We watch as they scratch their heads and mumble a bit, but finally they draft a performance requirement:

The average response time of the catalog web page listing up to 25 products should not exceed 1 second.

Great, now we have a goal!

To satisfy our new performance requirement, we need to shave off about a half-second of the response time, then stop tuning before we hit the point of diminishing returns. However, we’re not sure whether to start chiseling in the presentation layer or the business logic layer. We want to focus our attention on tuning the sections of code that contribute the most overhead to the overall response time. Guessing would be a fool’s game, so we put our trusty code profiler on the case to hunt down the busiest code. Table 1 shows the results of running the profiler:

Table 1. Using a code profiler takes the guesswork out of tuning by identifying hot spots worth optimizing. Always seek the advice of a code profiler before attempting to tune code.

The code profiler identifies the CatalogEJB.getProductsByCategory() method as the major contributor to the total response time. Well, that rules out tuning the presentation layer for any observable gain in performance. The top contributor is the EJB method, not the servlet method that presents its results. The profiler results also rule out tuning the database interaction. The time it takes to obtain a database connection and execute the SQL query is insignificant compared to the business logic in the EJB.

We have no question that the culprit is the code in the EJB that transforms database rows into products. But when we’re done tuning, how will we know we’ve made good progress? What if our tuning activities cause performance to take a step backward? We need a test.

JUnitPerf overview

JUnitPerf is an open source set of JUnit extensions for automated performance testing. JUnitPerf tests transparently wrap standard JUnit tests and measure their performance. In other words, we can build upon our existing functional test to make sure the code continues to work right. The JUnitPerf tests tell us if the code is fast enough. If a performance test doesn’t meet expectations, the whole test fails. If the functional test fails, the performance test fails. Conversely, if the performance test passes, then we have confidence that tuning didn’t cause existing functionality to break. Table 2 describes the major JUnitPerf classes and interfaces.

Table 2. JUnitPerf is a collection of classes and interfaces for performance testing existing JUnit tests.

Because JUnitPerf tests can run any class that implements JUnit’s Test interface, we could use JUnitPerf to measure the performance of any test conforming to this interface. In this section, we use it to wrap the JUnit test we wrote earlier for our catalog service. We could also use JUnitPerf to run HttpUnit tests and measure the performance of our entire web application, for example. Another option might be to use JUnitPerf to run Cactus tests to validate our catalog service’s business logic from within the EJB container.

But that’s another day. Right now, our customer is getting nervous. Let’s put our money where our mouth is with an automated response time test for the catalog service.

Testing response time

Recall that we’re staring down the barrel of a response time of approximately 1.4 seconds to display 25 products on a web page. We won’t sleep well until the response time is under 1 second. Luckily, we know what to do. The code profiler unambiguously indicated that optimizing the CatalogEJB.getProductsByCategory() method would be the smart move. How will we know when we’re done? Well, when a performance test passes, of course.

We want to write a test that will fail if the response time of our use case exceeds 1 second. To do that, we create a JUnitPerf TimedTest instance that wraps our existing CatalogTest.testGetProducts() test case method. Listing 2 shows the JUnitPerf test used to validate our performance expectations.

Listing 2: Testing the response time of the catalog service

public class CatalogResponseTimeTest {

    public static Test suite() {
		
        long maxTimeInMillis = 1000;

        Test test = new CatalogTest("testGetProducts");
        Test timedTest = new TimedTest(test, maxTimeInMillis);
        return timedTest;	
    }

    public static void main(String args[]) {
        junit.textui.TestRunner.run(suite());
    }
}

As a convenient way to run our test, our test defines a suite() method called by the JUnit test runner in the main() method. We run the CatalogResponseTimeTest, and it fails with the following output:

.TimedTest (WAITING): testGetProducts(com.bitterejb.catalog.ejb.CatalogTest): 1352 ms
F
Time: 1.352
There was 1 failure:
1)testGetProducts(com.bitterejb.catalog.ejb.CatalogTest)
Maximum elapsed time exceeded! Expected 1000ms, but was 1352ms.

FAILURES!!!
Tests run: 1,  Failures: 1,  Errors: 0

All right, we knew that would happen. We just wanted to see if the test was really measuring anything. The test expected the response time to be less than 1 second, but sure enough, it measured the same response time as observed by our customer — 1.4 seconds. Now we have a solid baseline from which to work. We should be able to optimize code, improving the response time until the test passes. If the test doesn’t eventually pass, we’ll need to start turning the performance knob in the other direction or look for another knob to turn.

Be aware of a subtle “gotcha!” when writing JUnitPerf tests. The response time measured by a TimedTest includes the elapsed time of the testXXX() method and its test fixture — the setUp() and tearDown() methods. Therefore, the maximum elapsed time specified in the TimedTest should be adjusted accordingly to take into account any cost of the existing test’s fixture.

Tweaking code

To get the test to pass, we follow the code profiler’s advice and optimize the logic that created ProductDetails objects from the rows in our database. The SQL query is sufficiently fast, according to the profiler, so nothing is gained barking up that tree. After optimizing, we run the CatalogResponseTimeTest again, and it gives us the following output:

.
TimedTest (WAITING): 
testGetProducts(com.bitterejb.catalog.ejb.CatalogTest): 751 ms

Time: 0.751

OK (1 test)

Hey, that did the trick! The test tells us that we made good progress. Had the test failed, we could have continued to optimize until it passed.

After showing off the improved catalog service to our customer, we add this test to our suite of performance tests. As we go forward, the automated test will continue to keep the response time of this use case in check.

Specifying scalability as a measure of success

At this point we know how long it takes for one user to get a list of 25 products using the catalog service. How long will the same process take if our application is under the stress of multiple concurrent users? Until now we haven’t thought much about scalability, but we’re confident that our simple design won’t let us down. Here’s where the rubber meets the road.

Our customer is impressed with our track record of meeting goals, and now is ready to hand us a new challenge. Performance planning early and often has enabled the customer to accurately estimate the expected load on the production system. The customer now wants to improve upon our performance success by writing a new performance requirement, which states:

The response time of the catalog web page listing up to 25 products should not exceed 1 second under a load of five concurrent users.

In other words, the catalog service should scale to handle five concurrent users while consistently maintaining the single-user response time we demonstrated earlier. That’s a tall order. Let’s use JUnitPerf to see how far off the mark our application currently is.

Testing response time under load

We have an automated JUnitPerf test that measures single-user response time. We’d like to use a similar testing technique to put this test under a load of five concurrent users while measuring each user’s response time. We want the test to fail if any user’s response time exceeds one second.

To do so, we write a JUnitPerf test that creates a LoadTest instance passing in a TimedTest instance and a number of concurrent users. The TimedTest in turn wraps our existing CatalogTest.testGetProducts() test case method. Listing 3 shows the JUnitPerf test used to validate our scalability expectations.

Listing 3: Testing the scalability of the catalog service

public class CatalogLoadTest {

    public static Test suite() {
		
        long maxTimeInMillis = 1000;
        int concurrentUsers = 5;

        Test test = new CatalogTest("testGetProducts");
        Test timedTest = new TimedTest(test, maxTimeInMillis);
        Test loadTest = new LoadTest(timedTest, concurrentUsers);
        return loadTest;
    }

    public static void main(String args[]) {
        junit.textui.TestRunner.run(suite());
    }
}

We run the CatalogLoadTest, which fails with the following output:

..........
TimedTest (WAITING):
testGetProducts(com.bitterejb.catalog.ejb.CatalogTest): 771 ms
TimedTest (WAITING): testGetProducts(com.bitterejb.catalog.ejb.CatalogTest): 1372 ms
F
TimedTest (WAITING): testGetProducts(com.bitterejb.catalog.ejb.CatalogTest): 1963 ms
F
TimedTest (WAITING): testGetProducts(com.bitterejb.catalog.ejb.CatalogTest): 2584 ms
F
TimedTest (WAITING): testGetProducts(com.bitterejb.catalog.ejb.CatalogTest): 3255 ms
F
Time: 3.40
There were 4 failures:

1)testGetProducts(com.bitterejb.catalog.ejb.CatalogTest)
  Maximum elapsed time exceeded! Expected 1000ms, but was 1372ms.
2)testGetProducts(com.bitterejb.catalog.ejb.CatalogTest)
  Maximum elapsed time exceeded! Expected 1000ms, but was 1963ms.
3)testGetProducts(com.bitterejb.catalog.ejb.CatalogTest)
  Maximum elapsed time exceeded! Expected 1000ms, but was 2584ms.
4)testGetProducts(com.bitterejb.catalog.ejb.CatalogTest)
  Maximum elapsed time exceeded! Expected 1000ms, but was 3255ms.

. . .

FAILURES!!!
Tests run: 5,  Failures: 4,  Errors: 0

Ouch! Our application can’t scale beyond one user. Notice that the first user’s response time is within the 1-second limit, but the other users’ response times bust the threshold. Worse yet, the response times increased for each successive user, indicating that our application has a bottleneck restricting its ability to scale.

So we fire up the code profiler and run the CatalogLoadTest to obtain clues. The code profiler doesn’t let us down. Table 3 shows what the profiler finds when the catalog service is under load.

Table 3. Running a profiler on the catalog service under load reveals contention for a database connection. Load testing tools help illuminate scalability bottlenecks.

The CatalogEJB.getConnection() method that was only taking around 10 milliseconds in our initial run of the code profiler is now taking up the majority of the overall response time. Let’s tune that method while continuing to test the single-user response time.

Using a connection pool to increase throughput

Based on the evidence provided by the code profiler, we conclude that a single database connection is the limiting factor to scaling our application. Consequently, requests for a connection are being queued. Each successive user’s response time in turn rises above the desired threshold.

In this case, pooling database connections is a low cost, high reward change sure to improve scalability. Before you start rolling your eyes over yet another example demonstrating the virtues of connection pooling, allow us to explain. We realize connection pooling is the poster child for many discussions on performance. Indeed, it’s a well-known performance problem, and that’s exactly why we’re using it here. We want the problem — and the solution — to be familiar so you can focus on how to test it. It’s not about the connection pool; it’s about the technique to discover it.

It’s also worth noting that we’ve run across more than one improperly sized connection pool. Worse yet, we’ve seen custom connection pools that were implemented incorrectly. (Yet another victim of the Not Invented Here antipattern.) In other words, just because you’re using a connection pool doesn’t necessarily mean you get instant scalability. You have to test for that, so let’s get back to the technique.

Instead of synchronizing access to a single database connection shared by multiple users, we refactor our Catalog EJB to use a database connection pool. We then configure the pool size to five active connections to improve scalability.

Then we run the CatalogLoadTest again, and it passes with the following output:

.....
TimedTest (WAITING): testGetProducts(com.bitterejb.catalog.ejb.CatalogTest): 751 ms
TimedTest (WAITING): testGetProducts(com.bitterejb.catalog.ejb.CatalogTest): 812 ms
TimedTest (WAITING): testGetProducts(com.bitterejb.catalog.ejb.CatalogTest): 822 ms
TimedTest (WAITING): testGetProducts(com.bitterejb.catalog.ejb.CatalogTest): 831 ms
TimedTest (WAITING): testGetProducts(com.bitterejb.catalog.ejb.CatalogTest): 811 ms

Time: 0.972

OK (5 tests)

Outstanding! Our scalability test is passing, and the underlying functional test continues to pass. This tells us that refactoring to use a database connection pool didn’t break anything. As we expected, the refactoring actually improved scalability. Because requests don’t need to be queued before being serviced, the response times are fairly consistent for each concurrent user. The test validates our design as able to handle five concurrent users without any specific user experiencing a delayed response time. If, in the future, the response time of any user increases beyond the limit set in our load test, the test will fail.

Summary: Tuning with confidence

In this article we gained confidence to tune performance with impunity by writing automated tests that continually keep performance requirements in check. Before tuning to improve performance, we profiled our code to find hot spots worth optimizing. We then measured progress by running our performance tests after every change. This methodology for side-stepping performance tuning pitfalls is applicable to any type of performance tuning activity — EJB or otherwise. The bottom line: When making a case for or against applying changes in the name of performance, don’t assume facts not already in evidence. Test first, then tune with confidence!

This article was adapted from the recently-published book “Bitter EJB” (Manning, 2003).

Mike Clark is president of Clarkware Consulting based in Denver, CO. He is co-author of “Bitter EJB” (Manning, 2003) and has created several open source tools including JUnitPerf for automated performance testing. Two years ago he discovered the joy and power of test-driven development, and he hasn’t written code the same way since. Mike frequently writes and speaks on his experiences in the trenches helping teams build better software faster. He chronicles his “Aha!” moments on his weblog. Mike has been crafting software professionally since 1992, immersed in Java since 1997.

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles