I am building a web crawler to access data to be processed. All the code is fairly high level, so I am drawn to Python, but there are certain bits of it that require data manipulation that is much easier in a C-like language (arrays are a big part of it).
Java seems to fit this role very well. It is statically typed, object-oriented, and doesn't delve into memory. However, it seems to get a lot of hate (or, at least, dismissal) from many programming communities, so I am asking, why not Java? Why is it so horrible as a systems language above C? Is there any other language that fits this role in a better way?
I am in particular asking this because I have been banging my head against the Python syntax for awhile, but I am trying to expand what languages I can program in.
The hate against Java comes from using Java for application development: this is largely due to the kinds of applications that are typically written in Java (line of business software) and (this is the most important reason) accidental complexity and low quality of APIs like Spring or J2EE.
Recipe for programming happyness is to use the right tool for the job:
* Python (or Ruby) for web application development, development tools, and "devops" scripting.
* C (or C++) for pieces that need deterministic performance[1], provide a "native" feeling user interface, or require control over memory layout.
Note: performance and efficiency are relative to what your throughput and latency requirements are. Google's crawlers and indexers will remain in C++ for the foreseeable future, but (for example) crawlers for an intranet can get away with being in Java (or Python for that matter).
* Java (or Scala, Haskell, OCaml, Go, Erlang, or one of the many Lisps) for "userland" systems programming. If the majority of the system fits under the last bullet point, use C++.
* Avoid JNI or Swig if you can. Use JSON + REST for cross-language RPC. If you need performance guarantees of a tight binary protocol use Thrift or Protocol Buffers. If you have to use JNI, consider using JNA first.
* No matter what language you use, stick to high quality libraries and tools. For Java, you'll absolutely want to use guava, Guice, and either Netty (or NIO.2 if you are using Java 7) or Jetty + Jersey + Jackson (for REST APIs).
Pick up either emacs and cscope, netbeans, Eclipse, or IntelliJ for navigating a large Java codebase.
All Java build tools suck. Maven sucks less and is the de-facto standard in the open source community. Twitter's "pants" is also worth looking at.
* Don't touch Spring with a 60-foot pole: in the mildest terms it's unequivocal and absolute garbage. Ditto for any other buzzword you may see in a job listing for an "enterprise" Java development job (with 20 years of experience required, naturally).
[1] Java performance can be quite high, but a JIT-ted and garbage collected runtime implies a lack of determinism.