Top 60 Oracle Blogs

Recent comments

Hadoop Summit 2013 – The Schedule

90% of winning is planning. I learned this as a kid from my dad, and I validated this through many years of work in operations. This applies to everything in life, including conferences.

So in order to maximize fun, networking and learning in Hadoop Summit, I’m planning my schedule in advance. Even if only few hours in advance. Its the thought that counts.

In addition to social activities such as catching up with my former colleagues from Pythian, dining with my distributed solutions architecture team in Cloudera and participating in the Hadoop Summit bike ride, I’m also planning to attend few sessions.

There are tons of good sessions at the conference, and it was difficult to pick. It is also very possible that I’ll change my plans in the last minute based on recommendations from other attendees. For the benefit of those who would like soume recommendations, or to catch up with me at the conference, here’s where you can find me:


11:20am: Securing the Hadoop Ecosystem – Security is important, but I’ll admit that I’m only attending this session because I’m a fan of ATM. Don’t judge.

12am: LinkedIn Member Segmentation Platform: A Big Data Application - LinkedIn are integrating Hive and Pig with Teradata. Just the type of use case I’m interested in, from my favorite social media company.

2:05pm: How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million - I’m a huge believer in offloading ETL to Hadoop and I know companies who saved big bucks that way. But $30M is more than even I’d expect, so I have to hear this story.

2:55pm: HDFS – What is New and Future - Everything Hadoop relies on HDFS, so keeping updated with new features is critical for Hadoop professionals. This should be a packed room.

4:05pm: Parquet: Columnar storage for the People - Parquet is a columnar data store for Hadoop. I’m interesting to learn more about Parquet as it should enable smoother transition of data warehouse workloads to Hadoop.


11:50pm: Mahout and Scalable Natural Language Processing - This session is promising so much data science content in one hour, that I’m a bit worried that my expectations are too high. Hope it doesn’t disappoint.

1:40pm:  Hadoop Hardware @Twitter: Size does matter  - I’m expecting operations level talk about sizing of Hadoop clusters. Something nearly every company adopting Hadoop is struggling with.

2:30pm: Video Analysis in Hadoop – A Case Study - My former colleagues at Pythian are presenting a unique Hadoop use-case they developed. I have to be there.

3:35pm: Large scale near real-time log indexing with Flume and SolrCloud - I’m encountering this type of architecture quite often now. Will be interesting to hear how Cisco are doing it.

5:15pm: Building a geospatial processing pipeline using Hadoop and HBase and how Monsanto is using it to help farmers increase their yield - Using Hadoop to analyze large amounts of geospatial data and help farmers. Sounds interesting. Also, Robert is a customer and a very sharp software architect, so worth attending.

Expect me to tweet every 5 minutes from one of the sessions. Its my way of taking notes and sharing knowledge.  If you are at the conference and want to get together, ping me through here or twitter. Also ping me if you think I’m missing a not-to-be-missed session.

See you at Hadoop Summit!