Issen's Tech Blog

Posts

Showing posts from June, 2015

More flexible Hadoop Cluster

June 29, 2015

I had an interesting scenario where the customer had a Hadoop Cluster based on HortonWorks. They had multi PB large HDFS cluster that have done a lot of great work for them, but the amount of data they pushed in daily in to the cluster and how much hardware they needed to buy to fit all data, was not cost effective for them. They run daily batch jobs and not real-time analytic, that mean they only need a sort of amount of CPU and memory capacity to be able to process all the new data that comes in. But they keep the old data to be able to get much better analytic result. When you read a standard Hadoop design are you recommended to run local disk on a server and then run multiple copies of each file to be able to secure and get a better performance then what a single SATA disk can produce. But this recommendation is because Hadoop was designed to run on cheap server hardware to play with. It's not until now when Hadoop get accepted and runs in to large ente...

Automated file distrobitionen

June 22, 2015

I helped a company that had problem with their backup situation, and didn't have a great total cost of ownership (TCO) of their file archive system that contain all original pictures they take daily. They grow with 6 TB of new data weekly and had 400TB of total data and together with the old backup software that still run Gran father/father/son backup method (Monthly Full and Daily Incremental backups). The full backup took more than 24 hours and the daily took more than 8 hours to before. When they got hold of the new picture, they could then work with the picture almost direct, but manually publish the new picture on the website. Backup was a problem, and the disk cost was always growing with no idea how to slow down that cost. What we did was implementing a new Software Define Storage filesystem to the customer and connect the web servers to a part of the filesystem. The filesystem is also policy based driven, and could then automatic backup the new files in...

We don't need backup anymore... LOL

June 16, 2015

The world turning in to an interesting place and the idea of having a backup is sometime over shadowed by the protection of the application itself. I have often heard customers saying that they don’t need backup anymore. Even larger companies like banks have said that backup is unnecessary because their applications have their own ways of protecting their data. About a month ago I received a call from a customer saying their Exchange consultants are telling the backup-team that there is no need to backup their new Exchange 2013 environment. During the migration from the old environment the customer decided to upgrade their Anti-virus/Spam software for their mails aswell. Guess what happened next? BAM!!! The Anti-Virus software went in to each mailbox and deleted all attachments in all emails, because it thought it was a virus. This was a bug in the Anti-Virus software and shouldn’t have happened but it did. The customer was lucky, they used IBM Spectrum Protect to backup all m...

Design your own Global Name Space

June 10, 2015

I met a customer today where they look for a true global filesystem. The customer has today a data center in Sweden and need to open another data center in north america. They customer have today a calculation cluster based on open source. To be able to share data between their calucation nodes are they using NFS share today. Based on regular NFS and the few functions they are using today did they see problems with either the NFS cache wasn't syncronized with the source or they client nodes access time to the source file was to slow. Behind the source NFS Server was a regular midrange storage system from a large global disk vender. And they even try with the vendors version of "flash system" that is SSD and not true flash, but still it is to slow. The customer has today approx 1 PB of data and yearly are they growing with 2,5PB of data and each calculation batch is approx 40-80TB data per calculation. The data are normally only used once or maybe twic...