Apache Camel Series: This post forms part of a series of blog posts about writing a “Real World Application” with Apache Camel. Covering topics like Persistence, Testing, IoC, API Integration and “Big Data”. Using the example of processing the NYC Trip Data, you can follow my progress here
TDD is often viewed as a cult! And with good reason a lot it’s practices seem counter productive and outright absurd. It’s devotee’s can come across as mumbling eccentrics extolling benefits of the practice… I hope to aviod that. While not being dogmatic highlight some of the benifits of the practice in a practical manner.
I’m going to assume you are familiar with unit testing and its benefits
Why Test Driven Development?
Unit testing is important! It protects the system against unintentional changes in behaviour when we change our code. Code will change and it is often not changed by the origanal author. It is really nice to have some code to automatically tell me: “Hey! Are you sure you meant to change this?”. That’s all that unit test is, it’s simply some code that says “Hey! This code is supposed to work like this”.
The problem is that unit testing is often an after thought in the development cycle. This leads unit tests being retrofitted onto code that has already been written days or weeks before, long after the context that the code was written has passed. That context is vital to the process of writing good tests. The point of unit tests is to assert aspects of the code work as intended. Assertions that seem obvious in the context the code was written in, are often forgotten when tests are written later. All TDD is trying to do is get you to write your tests while that context is still in your head.
Test driven dogma states: red, green, refactor! Red: write a failing test with some assertions about the code you want to write. Green: write just enough code to pass that test. Refactor: refactor the code to improve design and remove duplication. Rinse and repeat until a fully functional system appears. Test driven dogma is all YAGNI, anti-BDUF, let the tests drive the design.
WTF? This makes no sense! The idea that unit tests can design your code for you? While it’s benefits are often misunderstood, often because what actual unit testing actually is, is misunderstood. It provides an invaluable tool helping keeping the design of your code loosely coupled and ready for the inevitable changes it will under go in its life time.
Getting Started: Writing the first test
Now I wrote a bunch of code without writing any tests at all. Which is totally not TDD but is totally practical. I was figuring out how to get 2 libraries to play nice. I was not writing any sort of business logic, I was just figuring out how I would use guice with camel. I view TDD as tool to validate my code works as I expect it if I’m not sure how I expect my code to work there is not much for me to test.
Once I’m happy with the direction I’m taking the first test I always write is a simple intergration test to validate my application starts, does what I want and shutsdown cleanly.
package me.martinrichards.apache.camel.address;
import ...
public class ApplicationIntegrationTest {
@Test
@Ignore
public void testApplicationStartupUrl() throws Exception {
Application.main("https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2016-01.csv");
}
}
This is not a unit test, the point of this test is to allow me quick and easy
way for me to run my application and get a debugger on the code doing actual work.
Actually hit a database, actually download the CSV file, etc. most of the time
I find it difficult to know what to I need to write until I can look at the
arguments I’ve got with actual values. What does a CSVParser
look like with data
in it? How are the headers in my CSV file reflected in the CSVRecord
?
This is my poor mans subsitute for a REPL
in java, I need a way to interact with
and experiment with my code. This test gives me that. But this is not something I
want to always run with my unit tests so the test usually remains annotated with
@Ignore
.
The same thing can be achieved, if you’re working from an IDE like intelij, netbeans or eclipse, is to just run the main application. But I must be a some sort of TDD fanatic as that didn’t occur to me at first.
So how is this going to work?
Apache Camel works with messages, the first message into the system will be
a url
for the csv
that will be download and process. On startup the application
takes the list of URL
’s passed as arguments and pushes them into camel.
// Application.java
public void start(String... args) throws Exception {
camel.start();
for (String arg : args) {
camel.getManagedCamelContext().sendBody(AddressRoute.FROM, arg);
}
}
The Route
takes each message and hands it off to the TaxiDataProcessor
. This
is where the business logic starts and this is where we start to test. From a
functional point all that the processor needs to do is extract the CSVRecords
and pass them on for further processing.
public class TaxiDataProcessor implements Processor {
@Override
public void process(Exchange exchange) throws Exception {
String file = (String) exchange.getIn().getBody();
URL url = new URL(file);
CSVParser parser = CSVParser.parse(url);
for(CSVRecord record : parser) {
exchange.getOut().sendBody(record);
}
}
}
That’s a simple implementation that does what I need. Let’s write a test to verify the behaviour of the code.
package me.martinrichards.apache.camel.address.processor;
import ...
@RunWith(MockitoJUnitRunner.class)
public class TaxiDataProcessorTest extends BaseTestCase {
@Mock
Exchange exchange;
@Mock
Message message;
@InjectMocks
TaxiDataProcessor processor;
@Before
public void setup() {
when(message.getBody()).thenReturn("https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2016-01.csv");
when(exchange.getIn()).thenReturn(message);
}
@Test
public void testProcess() throws Exception {
processor.process(exchange);
Mockito.verify(message, times(3))
.setBody(any(TaxiDataCSVRecord.class));
}
}
There is quite a lot going in this first test, quite a lot to digest, especially if you’re new to testing.
So what are all these @Mock
annotations?
For me mocking was a key breakthrough for me when it came to testing, before mocks testing felt very difficult and cumbersome and complicated. Having to build complex object hierarchies just to test a single aspect always felt too hard for what at the end of the day is very simple. Mocks are simply objects pretend to be other objects. They allow you as the tester to define there behavior without having to stub out the full object hierarchy required to achieve the desired behavior from the system under test. For a detailed discussion on mocking and how it fits into testing I’d highly recommend Martin Fowler’s article Mocks aren’t Stubs.
The above test simply mocks out a camel
message and calls the processor directly,
as if the message was routed by camel
. This will create an instance of the
TaxiDataCSVParser
, start downloading and processing the CSV
. Which is about
1GB, obviously something that should be avoided in an unit test.
The TaxiDataProcessorTest
should only validate the behavior of the
TaxiDataProcessor
. What exactly is that behavior? Not downloading the CSV
.
From a high level the processor takes a camel Message
extracts the useful data
from it and passes it on to something that turns it into a list of records. Which
are then put on the camel output. The Processor
’s responsibility is to mediate
between the Apache Camel
framework and the business logic of the application.
Testing one unit at a time
The goal of a unit test, as implied by the name, is to test a single unit of the
code at a time. Making assertions about the units expected behavior in an isolated
manner. In order to achieve this ideally you’d want to mock behavior that is
external to the class. This is where creational design patterns come into play,
since a new CSVParser
needs to be made for each message (URL) we need a way to
intercept the creation of parser. I order to achieve that let’s create a
CSVParserFactory
and pass it in via the TaxiDataProcessor
’s constructor.
public class TaxiDataProcessor implements Processor {
private final TaxiDataCSVParserFactory factory;
public TaxiDataProcessor(final TaxiDataCSVParserFactory factory) {
this.factory = factory;
}
@Override
public void process(Exchange exchange) throws Exception {
String file = (String) exchange.getIn().getBody();
URL url = new URL(file);
final TaxiDataCSV parser = factory.build(url);
for(TaxiDataCSVRecord record : parser) {
exchange.getOut().sendBody(record);
}
}
}
Now unfortunately the CSVParser
in Apache Commons is a final class, so it can
not be easily mocked. So in order to mock the CSVParser
it needs to be wrapped
by our own class. Which from purely a design perspective means our business logic
is going to be completely separated from any direct reliance on external libraries.
If we want to change CSVParser
’s in the future all that is required is changing
the type of parser that our parser uses. All the business logic of our application
should remain untouched by the change of an underlying library.
public class TaxiDataCSVFactory {
public TaxiDataCSV build(URL url) throws IOException {
return new TaxiDataCSV(CSVParser.parse(url, Charset.defaultCharset(),
CSVFormat.RFC4180.withHeader()));
}
}
This does however require a little extra work up front in distilling the behavior of the system under test and how best to approach testing it.
public class TaxiDataCSV implements Iterable<TaxiDataCSVRecord> {
private final TaxiDataCSVIterator iterator;
public TaxiDataCSV(Iterable<CSVRecord> parser){
this.iterator = new TaxiDataCSVIterator(parser);
}
@Override
public Iterator<TaxiDataCSVRecord> iterator() {
return iterator;
}
}
The essence of the TaxiDataProcessor
is taking a URL
and looping over the
resulting records of that URL
. What you end up with is a TaxiDataCSVFactory
that returns an Iterable
. Since the Apache Commons CSVParser
is an Iterator
we simply need to wrap it in an Iterable
to allow the processor to loop through
all the records returned by the parser.
The actual CSVParser
even though it is an Iterator
it is still a final class
and cannot be mocked. So it still needs to be wrapped.
public class TaxiDataCSVIterator implements Iterator<TaxiDataCSVRecord> {
private final Iterable<CSVRecord> parser;
public TaxiDataCSVIterator(Iterable<CSVRecord> parser) {
this.parser = parser;
}
@Override
public boolean hasNext() {
return parser.iterator().hasNext();
}
@Override
public TaxiDataCSVRecord next() {
return new TaxiDataCSVRecord(parser.iterator().next());
}
}
The test can now simply mock out the expected behavior of the dependent objects and assert it responds as expected.
@RunWith(MockitoJUnitRunner.class)
public class TaxiDataProcessorTest extends BaseTestCase {
@Mock
Exchange exchange;
@Mock
Message message;
@Mock
TaxiDataCSV taxiDataCSV;
@Mock
TaxiDataCSVIterator iterator;
@Mock
TaxiDataCSVRecord record;
@Mock
TaxiDataCSVFactory factory;
@InjectMocks
TaxiDataProcessor processor;
@Before
public void setup() {
when(message.getBody()).thenReturn("https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2016-01.csv");
when(exchange.getIn()).thenReturn(message);
}
@Test
public void testProcess() throws Exception {
when(iterator.hasNext()).thenReturn(true, true, true, false);
when(iterator.next()).thenReturn(record)
.thenReturn(record).thenReturn(record);
when(taxiDataCSV.iterator()).thenReturn(iterator);
when(factory.build(any(URL.class))).thenReturn(taxiDataCSV);
processor.process(exchange);
Mockito.verify(message, times(3))
.setBody(any(TaxiDataCSVRecord.class));
}
}
An explosion of Classes
Now that escalated quickly! From simply trying to mock a single dependency to 5 additional classes isolating the parsing of the data from the routing of the data (keeping Apache Camel ignorant of Apache Common CSV). This whole design bore out of the need to test processor in an isolated manner. It quite neatly illustrates the advantages and disadvantages of using a test driven approach to the design and development of software.
Simple is not Easy and this makes it quite clear. There is a clean separation of concerns with each class remaining simple by itself but as whole it seems to be complex.
Balance
At the end of the day testing is another tool to create reliable software. Finding the right place to use it is as important as knowing how to use it. While testing the processor as a single unit drove out a quite elegant design, demonstrating the power of taking a test driven approach. It seems like over kill for such a simple task, providing little benefit over testing it as a more complete unit using a fixture.
Writing good tests is hard! Finding the right balance between writing simple, maintainable code and simply getting things done is a true art form.