When data is so huge that the usual approaches don’t work anymore.

  1. Hadoop has made storing a lot of data reliably (“resiliently”) very cheap.
  2. This data needs to be analysed to gain knowledge.
  3. This process should work automatically and in production.
  4. This process should work in real-time (streaming).