Talend makes my teeth grind. I don't understand why an ETL tool uses a strongly typed language for a foundation. The number of fun productive hours I've spent swapping chars to varchars, int to decimal & vice versa. In 2017 computers can read a registration plate from a blurry photograph and spot a criminal in a stadium, but a user puts apostrophe in a CSV file and schmoo leaks everywhere.
I've been building ETL tools in my day to day work for the past 18 months and weak types are a disaster. In all of my tools as soon as I extract data, the first thing I do is establish its type. I have great, reusable tools for type conversation that are set by rules for load.
Any CSV you're working with should be properly escaped anyway or you're bound for a world of pain.
Maybe we have drastically different use cases, but dynamic and weak typing are a disaster in the data space. Not sure why people build production systems in such languages.
If it matters anywhere it matters in the data space. We don't see the disconnect between decimal and int, but when you're expecting a character and you get varchar, (not sure about the apostrophe case, but I suspect your talking about quotes and embedded commas) and the number of fields or composition of fields changes (e.g. col1:sttring "jack, dorsey" col2:int 156 and the parser sees col1:string "jack" col2: "dorsey, 156" you want to know that is broken ASAP.
If you enjoy making sure your peas don't touch the carrots, then sure, strong typing is great in the data space. But when you get woken up because a spreadsheet comes through as 11.0 rather than 11 or you have to type
Double.parseDouble(x) == Double.parseDouble(y)
/* instead of pythons */
x == y
27 times to get a feed file parsed, then I'd say that the tools are lacking basic useability. And especially primitive in light of the handwriting reading, photo tagging, go playing,supermario winning possibilities of ML.
That means you don't have a clean "domain model" (or business logic layer) and a data filtering layer that creates the domain objects. You should apply business logic on the pure objects/entities, and you should make sure that the filtering handles the representational problems (parsing, data integrity, partial data, and so on).
And ideally if something makes it through the data filtering layer into the logic layer and does not make sense there, then that should be handled. And that's where strong types help. It forces you to handle these cases, even if that means logging/alerting/ignoring, but at least you'll have to make a decision when you write the logic, instead of 3AM in the morning.
When a tool insist on you cast char fields to varchars before you can test two fields for equality, or keeps changing all your decimals to floats, how is that helping? I'm saying that if the underlying language was loosely typed, those kind of productitvy saps & bug fountains would not happen. In the few instance where you cared about type, a loosely typed language can usually offer something.
Last time I checked, enterprise ETL tools were sold as capable of a lot more than simple OLAP to OLTP. I find the reality provided is somewhat underwhelming. Given that facebook can tell the difference between a photo of Dave and one of Jim, why do I have to manually provide a mask for every single date field flowing through an enterprise?