More flexible Hadoop Cluster
I had an interesting scenario where the customer had a Hadoop Cluster based on HortonWorks. They had multi PB large HDFS cluster that have done a lot of great work for them, but the amount of data they pushed in daily in to the cluster and how much hardware they needed to buy to fit all data, was not cost effective for them. They run daily batch jobs and not real-time analytic, that mean they only need a sort of amount of CPU and memory capacity to be able to process all the new data that comes in. But they keep the old data to be able to get much better analytic result. When you read a standard Hadoop design are you recommended to run local disk on a server and then run multiple copies of each file to be able to secure and get a better performance then what a single SATA disk can produce. But this recommendation is because Hadoop was designed to run on cheap server hardware to play with. It's not until now when Hadoop get accepted and runs in to large ente...