Building a Linux Media Network, one step at a time: WideFinder/Java6 Rev2

... The sum total of the next 6 paragraphs is, "I tested a theory without collecting performance data and my changes seemed to have no impact on performance"... Skip to the DTrace Analysis section if that doesn't sound like a fun read...

In the initial revision of the WideFinder, I couldn't get away from the thought that I was handling the data one too many times. If you look at WideFinder.run (WideFinder.java:57),

  byte[] seq = new byte[ seek - _mbb.position() ];
_mbb.get( seq );
nextLine = new String( seq, LOG_CHARSET );

We're grabbing bytes out of the buffer, sticking them into a fresh byte array, and creating a new String object based on those bytes, using a specific charset. This is because the character set used by CharBuffer treats each byte as half of a 16-byte char, which means that the bytes are interpreted as non-English characters.

Creating new byte[]s and new Strings like that, in a pretty tight loop, is just asking for performance trouble. I wanted to get to the point where the regex could be applied straight to the ByteBuffer. Looking at the implementation of Charset.encode, it seemed like it suffered from the same sort of space-inefficiency as my original approach.

The solution was to wrap the ByteBuffer in a trivial implementation of CharSequence that indexed directly into the current line of the buffer. The only gotcha was to use the mark() and reset() methods of the buffer in the implementation of subSequence.

Unfortunately, my efforts produced only a minimal increase in performance (~2%) on either my single-core Athlon or the T2K. So much for "low-hanging fruit".

DTrace Analysis
I created a new revision of the Java 6 WideFinder with some tweaks to the ByteBuffer implementation. It's available here: wf_02.tar.gz. It turned out to be a few hours' effort for a very minimal performance gain.

With that little misadventure under my belt I decided to run the WideFinder through DTrace to see where the bulk of time was being spent. DTrace is Sun's dynamic tracing tool, it allows you to inspect a bunch of different aspects of a program's performance as it is running. If you've never used it, you should really give it a try. It's pretty amazing, the stuff you get visibility into. I gather it's also available on OS X as the "Instruments" app.

This is the output of running the method-timings.d script against the WideFinder on a 100-line log file (produced by the DataGenerator). Times given are in microseconds and represent the total time spent in methods in classes of the given package:

sun/net/www/protocol/file         103
sun/net/www                       266
sun/net/www/protocol/jar          375
java/nio/channels                 474
java/util/concurrent/atomic       791
sun/security/action              1065
java/util/jar                    1731
java/util/zip                    3146
java/net                         3654
java/security                    3807
java/lang/ref                    5319
java/util/concurrent             7419
java/util/concurrent/locks       8329
java/lang/reflect               10365
sun/misc                        20678
java/nio/channels/spi           25753
sun/nio/ch                      32353
java/nio/charset               160325
java/util                      197794
sun/nio/cs                     204016
java/io                        240720
sun/reflect                    293949
wf                             464158
java/lang                     1061836
java/nio                      1596102
java/util/regex               2738455

The absolute values are not as meaningful as the relative values. The act of observing the program's performance has skewed the measurements (where's that damned cat!), but we hope that it's skewed the measurements equally for each package. From these data, we can see that the "big spenders" are java.util.regex (~39% of total time), java.nio (~25% of total time, with subpackages), and java.lang (~15% of total time).

Here are the 10 most expensive methods in java.nio (not including subpackages), also measured in microseconds:

CharBuffer.arrayOffset     43204
CharBuffer.<init>          44219
ByteBuffer.arrayOffset     51038
HeapCharBuffer.<init>      62420
CharBuffer.wrap           122109
Buffer.position           280288
Buffer.limit              286871
Buffer.checkIndex         375539
DirectByteBuffer.ix       485376
DirectByteBuffer.get     1084831

And here are the 10 most expensive methods in java.util.regex (although there seem to be a lot of very expensive methods in this package):

Pattern.escape                   165501
Pattern$CharProperty.<init>      180954
Pattern.peek                     184470
Pattern$8.isSatisfiedBy          199079
Pattern.isSupplementary          238639
Pattern.atom                     240097
Pattern$Slice.match              261159
Matcher.getSubSequence           282353
Matcher.group                    291688
Pattern$BmpCharProperty.<init>   291799
Pattern$BmpCharProperty.match   1080132

Looks like, if there's low-hanging fruit, it's in the regex processing. A little regex-optimization may go a long way.

Building a Linux Media Network, one step at a time

Tuesday, November 13, 2007

WideFinder/Java6 Rev2

0 Comments:

About

About Me

Previous