Friday, August 28, 2015

AWS Datapipeline and Python Script

AWS Datapipeline and Python Script

I had a task to do and it was supposed to be done quickly. Then I got to know that there is a language which can do certain task efficiently and easily as compared to other languages and its called "Python".

Python has lots of powerful library to perform tasks quickly as it is like a scripting language.
Although, I hadn't had any knowledge of using this wonderful language but I thought of giving it a try and guess what!!! ,  it is similar to other languages like Java,C# and also need less time to do certain tasks as lot of libraries support in python.

Let me tell you I had a task in which I need to do some of the manipulations in AWS resources and save result in other AWS resources and python has a powerful library called Boto which is very easy to work on . Have a look at the boto library here

I was able to complete my work quickly using python  and created a python script and now there is some requirement to schedule this script so that this script runs daily at a certain time and perform its task.

As we are using heavily the AWS resources for our work so it was not the difficult task to choose AWS Datapipeline to do this work for us using EMR clusters.

So, now I have all of the resources - my script was ready and i can also schedule that script by using aws datapiplines but a question pop up in my mind whether I can schedule a python script using datapipeline or not.
FYI, I was also new on datapipeline.

I decided to research on that and after lot of effort -searching on internet ;) and various hit and trial on datapipeline options .I was successfully able to schedule my python script using boto library on aws datapipeline.


So, Here are some of the points to schedule python script on aws datapipeline, so that it would be easy for you guys :-
Step1: Have your python script ready.
Step2: AWS account and console.
Step3: Choose Datapipeline and start creating a datapipeline.
Step4: Choose source as EmrActivity and provide the S3 path of  your script in "input"
           and provide output path to another S3 bucket location.
Step 5: In order to run python from EMR cluster ,you need to add  "preStepCommand" : ""   .
Step 6:Choose EMR cluster and choose the desired configuration of the hardware.
Step7 : Schedule your job and you can also add preconditions so that datapipeline checks for precondition fulfillment before each run.
Step8: Setup logs in your S3 logs directory so that you can check problem in your job and debug issue using those logs.

Step8: Set SNS topics and subscribe for job completion and job failure notifications.

Finally, have fun and let other hard work to be done for you by datapipelines.  



Saturday, March 23, 2013

Mutex-Simply



Mutex - I just came across a very simple definition and example of mutex from Threading in C#, by Joe Albahari, so I just thought of sharing it.
A Mutex is like a C# lock, but it can work across multiple processes. In other words, Mutex can be computer-wideas well as application-wide.
Acquiring and releasing an uncontended Mutex takes a few microseconds — about 50 times slower than a lock.
With a Mutex class, you call the WaitOne method to lock and ReleaseMutex to unlock. Closing or disposing aMutex automatically releases it. Just as with the lock statement, a Mutex can be released only from the same thread that obtained it.
A common use for a cross-process Mutex is to ensure that only one instance of a program can run at a time. Here’s how it’s done:
class OneAtATimePlease
{
  static void Main()
  {
    // Naming a Mutex makes it available computer-wide. Use a name that's
    // unique to your company and application (e.g., include your URL).

    using (var mutex = new Mutex (false, "oreilly.com OneAtATimeDemo"))
    {
      // Wait a few seconds if contended, in case another instance
      // of the program is still in the process of shutting down.

      if (!mutex.WaitOne (TimeSpan.FromSeconds (3), false))
      {
        Console.WriteLine ("Another app instance is running. Bye!");
        return;
      }
      RunProgram();
    }
  }

  static void RunProgram()
  {
    Console.WriteLine ("Running. Press Enter to exit");
    Console.ReadLine();
  }
}
If running under Terminal Services, a computer-wide Mutex is ordinarily visible only to applications in the same terminal server session. To make it visible to all terminal server sessions, prefix its name with Global\.


Note : For more in depth knowledge you can go to link