WideFinder/Java6 Rev2
... The sum total of the next 6 paragraphs is, "I tested a theory without collecting performance data and my changes seemed to have no impact on performance"... Skip to the DTrace Analysis section if that doesn't sound like a fun read...
In the initial revision of the WideFinder, I couldn't get away from the thought that I was handling the data one too many times. If you look at WideFinder.run (WideFinder.java:57),
Creating new byte[]s and new Strings like that, in a pretty tight loop, is just asking for performance trouble. I wanted to get to the point where the regex could be applied straight to the ByteBuffer. Looking at the implementation of Charset.encode, it seemed like it suffered from the same sort of space-inefficiency as my original approach.
The solution was to wrap the ByteBuffer in a trivial implementation of CharSequence that indexed directly into the current line of the buffer. The only gotcha was to use the mark() and reset() methods of the buffer in the implementation of subSequence.
Unfortunately, my efforts produced only a minimal increase in performance (~2%) on either my single-core Athlon or the T2K. So much for "low-hanging fruit".
DTrace Analysis
I created a new revision of the Java 6 WideFinder with some tweaks to the ByteBuffer implementation. It's available here: wf_02.tar.gz. It turned out to be a few hours' effort for a very minimal performance gain.
With that little misadventure under my belt I decided to run the WideFinder through DTrace to see where the bulk of time was being spent. DTrace is Sun's dynamic tracing tool, it allows you to inspect a bunch of different aspects of a program's performance as it is running. If you've never used it, you should really give it a try. It's pretty amazing, the stuff you get visibility into. I gather it's also available on OS X as the "Instruments" app.
This is the output of running the method-timings.d script against the WideFinder on a 100-line log file (produced by the DataGenerator). Times given are in microseconds and represent the total time spent in methods in classes of the given package:
The absolute values are not as meaningful as the relative values. The act of observing the program's performance has skewed the measurements (where's that damned cat!), but we hope that it's skewed the measurements equally for each package. From these data, we can see that the "big spenders" are java.util.regex (~39% of total time), java.nio (~25% of total time, with subpackages), and java.lang (~15% of total time).
Here are the 10 most expensive methods in java.nio (not including subpackages), also measured in microseconds:
In the initial revision of the WideFinder, I couldn't get away from the thought that I was handling the data one too many times. If you look at WideFinder.run (WideFinder.java:57),
byte[] seq = new byte[ seek - _mbb.position() ];We're grabbing bytes out of the buffer, sticking them into a fresh byte array, and creating a new String object based on those bytes, using a specific charset. This is because the character set used by CharBuffer treats each byte as half of a 16-byte char, which means that the bytes are interpreted as non-English characters.
_mbb.get( seq );
nextLine = new String( seq, LOG_CHARSET );
Creating new byte[]s and new Strings like that, in a pretty tight loop, is just asking for performance trouble. I wanted to get to the point where the regex could be applied straight to the ByteBuffer. Looking at the implementation of Charset.encode, it seemed like it suffered from the same sort of space-inefficiency as my original approach.
The solution was to wrap the ByteBuffer in a trivial implementation of CharSequence that indexed directly into the current line of the buffer. The only gotcha was to use the mark() and reset() methods of the buffer in the implementation of subSequence.
Unfortunately, my efforts produced only a minimal increase in performance (~2%) on either my single-core Athlon or the T2K. So much for "low-hanging fruit".
DTrace Analysis
I created a new revision of the Java 6 WideFinder with some tweaks to the ByteBuffer implementation. It's available here: wf_02.tar.gz. It turned out to be a few hours' effort for a very minimal performance gain.
With that little misadventure under my belt I decided to run the WideFinder through DTrace to see where the bulk of time was being spent. DTrace is Sun's dynamic tracing tool, it allows you to inspect a bunch of different aspects of a program's performance as it is running. If you've never used it, you should really give it a try. It's pretty amazing, the stuff you get visibility into. I gather it's also available on OS X as the "Instruments" app.
This is the output of running the method-timings.d script against the WideFinder on a 100-line log file (produced by the DataGenerator). Times given are in microseconds and represent the total time spent in methods in classes of the given package:
sun/net/www/protocol/file 103
sun/net/www 266
sun/net/www/protocol/jar 375
java/nio/channels 474
java/util/concurrent/atomic 791
sun/security/action 1065
java/util/jar 1731
java/util/zip 3146
java/net 3654
java/security 3807
java/lang/ref 5319
java/util/concurrent 7419
java/util/concurrent/locks 8329
java/lang/reflect 10365
sun/misc 20678
java/nio/channels/spi 25753
sun/nio/ch 32353
java/nio/charset 160325
java/util 197794
sun/nio/cs 204016
java/io 240720
sun/reflect 293949
wf 464158
java/lang 1061836
java/nio 1596102
java/util/regex 2738455
The absolute values are not as meaningful as the relative values. The act of observing the program's performance has skewed the measurements (where's that damned cat!), but we hope that it's skewed the measurements equally for each package. From these data, we can see that the "big spenders" are java.util.regex (~39% of total time), java.nio (~25% of total time, with subpackages), and java.lang (~15% of total time).
Here are the 10 most expensive methods in java.nio (not including subpackages), also measured in microseconds:
CharBuffer.arrayOffset 43204And here are the 10 most expensive methods in java.util.regex (although there seem to be a lot of very expensive methods in this package):
CharBuffer.<init> 44219
ByteBuffer.arrayOffset 51038
HeapCharBuffer.<init> 62420
CharBuffer.wrap 122109
Buffer.position 280288
Buffer.limit 286871
Buffer.checkIndex 375539
DirectByteBuffer.ix 485376
DirectByteBuffer.get 1084831
Pattern.escape 165501Looks like, if there's low-hanging fruit, it's in the regex processing. A little regex-optimization may go a long way.
Pattern$CharProperty.<init> 180954
Pattern.peek 184470
Pattern$8.isSatisfiedBy 199079
Pattern.isSupplementary 238639
Pattern.atom 240097
Pattern$Slice.match 261159
Matcher.getSubSequence 282353
Matcher.group 291688
Pattern$BmpCharProperty.<init> 291799
Pattern$BmpCharProperty.match 1080132
0 Comments:
Post a Comment
<< Home