Talend makes my teeth grind. I don't understand why an ETL tool uses a strongly ...

busterarm · on March 1, 2017

I've been building ETL tools in my day to day work for the past 18 months and weak types are a disaster. In all of my tools as soon as I extract data, the first thing I do is establish its type. I have great, reusable tools for type conversation that are set by rules for load.

Any CSV you're working with should be properly escaped anyway or you're bound for a world of pain.

frugalmail · on March 1, 2017

Maybe we have drastically different use cases, but dynamic and weak typing are a disaster in the data space. Not sure why people build production systems in such languages.

If it matters anywhere it matters in the data space. We don't see the disconnect between decimal and int, but when you're expecting a character and you get varchar, (not sure about the apostrophe case, but I suspect your talking about quotes and embedded commas) and the number of fields or composition of fields changes (e.g. col1:sttring "jack, dorsey" col2:int 156 and the parser sees col1:string "jack" col2: "dorsey, 156" you want to know that is broken ASAP.

gregn610 · on March 1, 2017

If you enjoy making sure your peas don't touch the carrots, then sure, strong typing is great in the data space. But when you get woken up because a spreadsheet comes through as 11.0 rather than 11 or you have to type

    Double.parseDouble(x) == Double.parseDouble(y) 
    /* instead of pythons */ 
    x == y

27 times to get a feed file parsed, then I'd say that the tools are lacking basic useability. And especially primitive in light of the handwriting reading, photo tagging, go playing,supermario winning possibilities of ML.

pas · on March 2, 2017

That means you don't have a clean "domain model" (or business logic layer) and a data filtering layer that creates the domain objects. You should apply business logic on the pure objects/entities, and you should make sure that the filtering handles the representational problems (parsing, data integrity, partial data, and so on).

And ideally if something makes it through the data filtering layer into the logic layer and does not make sense there, then that should be handled. And that's where strong types help. It forces you to handle these cases, even if that means logging/alerting/ignoring, but at least you'll have to make a decision when you write the logic, instead of 3AM in the morning.

vira · on March 1, 2017

Strong typing helps. Keep in mind enterprise ETL tools are designed to move data from oltp to olap databases, which are often strongly typed as well.

gregn610 · on March 1, 2017

When a tool insist on you cast char fields to varchars before you can test two fields for equality, or keeps changing all your decimals to floats, how is that helping? I'm saying that if the underlying language was loosely typed, those kind of productitvy saps & bug fountains would not happen. In the few instance where you cared about type, a loosely typed language can usually offer something.

Last time I checked, enterprise ETL tools were sold as capable of a lot more than simple OLAP to OLTP. I find the reality provided is somewhat underwhelming. Given that facebook can tell the difference between a photo of Dave and one of Jim, why do I have to manually provide a mask for every single date field flowing through an enterprise?

busterarm · on March 1, 2017

I don't use enterprise ETL tools. I pretty much write my own every time and occasionally I'll supplement that with things like Kiba.

frugalmail · on March 1, 2017

We're part of a large enterprise but we use Azkaban and Scala for our data movement. And strong typing is a must in our group.