The Nightmare of Azkaban with Hive (Hadoop)

I have been working on a deployment of Azkaban for about a week now; and getting the server up and running was easy.  However I have had many major issues with Azkaban since day one.  I feel like sharing this could help someone else if they decide to use it.

Pros:

  • It has dependency flows that are easy to use.
  • ACLs
  • Pretty Graphs
  • Scheduling (Kinda its purpose)
  • Good API

Cons:

  • Hive/Pig and Possibly other Jobtypes simply do not work.
    • After many hours of searching I found there is a bug in the jobtypes plugin, and it has not been fixed.
      • You must completely recompile with the newer version to have these job types work.
    • The Newest version of Azkaban is no where close to the version they have on their site.
      • This also is not compiled, you will have to do it manually.
    • The Documentation is full of errors, bad links, and omissions (at best).
      • Not to mention it is all for 2.5, when 3.x is out.
      • SSL Keystore Doc Links are all bad (major setup step).
    • No Packages for YUM/APT/ZYPPER/etc
      • I would have thought someone would have done this by now
        • I created some using FPM
    • No INIT script built. (Azkaban Init Script)

So, in closing as you can see the project has good intentions but that’s about it.  Its not ready for prime time, and they really need to get their stuff together.  If it were cleaned up and recompiled, and packaged.  It would probably be an OK product. However, the lack of organization and communication will be what prevent me from recommending this for any of my personal customers going forward.  I hope this helps anyone considering this product.

 

Sincerely,
Matthew Curry