Jon Leaman

A Few CloudTrail Best Practices

There are many tools out there to help monitor and alert on AWS accounts, both native and third-party. Across every tool I have tested, one alert is always critical and it’s so easy to fix that there is no excuse not to have it on — CloudTrail.

CloudTrail, in short, logs every API call in your AWS account. The importance of this should be self-evident to my readers, so bear with me. If something bad happens in your account, you want to know who did it and when. You want an audit trail! Without CloudTrail, you’ll be flying without a rear-view mirror. You won’t have any hindsight to be 20-20 about, and that’s a shame.

At what cost? Basically the cost of storing the data in S3 and CloudWatch — minimal. If you’re not sold and you’ve read this, I’ll have no sympathy.

Moving on.

Historically, CloudTrail was enabled per region. This means that when a new region comes online, you must remember to go and enable it in that region. If you don’t automate this, there is room for neglect. CloudTrail now has an ‘all-regions’ setting per trail. My recommendation is create a new trail that has all-regions enabled. If this is as far as you go, that’s okay, but we can take it a step further.

At this point, you’ll have an S3 bucket that is logging all API calls across all regions, and in the event a new region comes online, that region too. Additionally, you can pipe these logs through CloudWatch, which I recommend. Typically, most customers only use 1-3 regions, so if you have non-readonly activity in any of the other regions, you probably want to be alerted. I’m going to walk you through setting up alerts to answer the following question:

How can I be alerted of any activity in non-approved regions, with the exception of read-only calls?

Continue reading

Jon Leaman

Lambda 101 – Serverless Business Logic

I’ll keep this post short and let the video do the talking. This twelve minute video will walk through three different Lambda examples and investigate the payloads of each. The goal is to get developers and system administrators comfortable with using Lambda to execute business logic!

See below for more details. Enjoy!

Here are some additional references:

Function code: http://pastebin.com/mcAxrkdw

1) Jeff Barr’s Blog @ AWS is a good source for new announcements, interesting use cases, and much more: https://aws.amazon.com/blogs/aws/category/aws-lambda/

2) CloudSploit’s write up on how they made their whole company serverless with some insights on the savings they’ve seen:
https://medium.com/@CloudSploit/we-made-the-whole-company-serverless-5a91c27cd8c4

3) A deep dive into developing a serverless application and many of the considerations that need to be made. Written by Mike Watters (https://github.com/zerth): http://tech.adroll.com/blog/dev/2015/11/16/count-things-with-aws-lambda-python-and-dynamodb.html

4) Working with serverless applications is great, but how to you manage such an application over the lifecycle of the app? Michael Wittig (https://twitter.com/hellomichibye) answers this question on his blog: https://cloudonaut.io/the-life-of-a-serverless-microservice-on-aws/

5) Could it get any easier!? The innovation has just begun! Check out AWS’s Python Serverless Microframework: https://aws.amazon.com/blogs/developer/preview-the-python-serverless-microframework-for-aws/

I hope you enjoyed. If you have feedback or questions, leave them here!

Jon Leaman

Standing on the Edge of the Unknown

Tomorrow, I start working for Amazon Web Services!

In order to put my excitement in context, it’s important to know where I’m coming from and where I’m looking to go.

Five years ago, I started working for EMC.  EMC took a risk hiring me, but it paid off.  I was new to the workforce with no experience in storage and completely naive to the trends that shape the IT landscape.  I didn’t even know what they were asking me to do.  What does a pre-sales engineer do, exactly?  During my five years, EMC gave me access to all the resources I needed to be successful.  When I finally figured out how to do my job well, I had a new perspective on career and the IT landscape as a whole.  I will never forget the people at EMC who helped me along the way.

Now that I saw things in a new light, I started asking myself how I could do it again.  This lead to a multi-year search that ended two weeks ago with Amazon.  My criteria was as follows:

  • I must be able to provide value day one.  I’m good at understanding complex technologies, mapping the value of the technology to business needs, and message this value to different stakeholders.
  • They must have a sound strategy.  I was searching for a company that has the potential to be a market leader (or is one), and having a poor strategy won’t get you there.
  •  Located in Boston.

I profiled hundreds of companies and only four made it through my filter.  I didn’t obsess over my search, but I was always looking.  If anyone ever mentioned a company I hadn’t heard of, the next thing I would do would investigate them.  My notebook is filled with dozens of companies that didn’t make the cut.

Then I got the call.  Amazon wanted to talk to me?  I didn’t even have a warm introduction!  This was like MIT or Harvard approaching me to go to their school — I just couldn’t believe it.  I’m still having a hard time believing it.

There is always a degree of uncertainty that comes with changing roles.  I won’t have a support network and I know I don’t know a lot of things that I need to know to be successful at Amazon.  But I trust in myself to build my network within Amazon, educate myself on the gaps I have with the technology, make some friends along the way, and have fun doing it!

With every new piece of information I get my hands on, I am more certain that this is the right decision for me.  Have you read anything that Jeff Bezos has said?  Have you seen the new drone video Amazon just released?  Are you aware of just how many web services AWS offers?  AAHHHH, YES!!!!

Blue Origin employees celebrating a rocket land.

Blue Origin employees celebrating a rocket land.  This happened after I accepted the offer! (source: https://www.youtube.com/watch?v=igEWYbnoHc4)

Today I stand tall.  The energy is surging through my body!  I’m off to memorize Amazon’s Leadership Principles before my first day.  I’ll let you know how it goes!

Jon Leaman

The Cloud Landscape – Do I need a strategy?

Let’s take a minute to reflect on the IT industry.  Heraclitus had it right: “The only thing that is constant is change.”  We’ve all seen the disruptive waves of technology sweep through the IT world. With every wave there are winners and losers. Speaking directly to the compute environment, there have been mainframes, microprocessors, open systems, virtualization, and now cloud computing.
“I think there is a world market for maybe five computers.” — Thomas Watson, chairman of IBM, 1943. (The irony here is that IBM is the only company to survive every disruptive wave to date).

Continue reading

Jon Leaman

How I solved Aristotle’s Puzzle in 1 Hour (no spoilers)

I was given Aristotle’s Number game as a gift this Christmas.  The difficulty on the back read “Nightmarish”, but I wasn’t scared.  That said, my title may be misleading.  It took me 3-4 days to figure out all 12 solutions, but my script only took 1 hour and 3 minutes to run through the 250m permutations I was able to break it down into.  How this puzzle is solved without a computer is mind boggling.  gameBoard

The rules are simple.  You have 19 hex pieces that have a number 1-19 on them.  They need to be placed on the board, seen above, so that all rows sum to 38.  That’s in any direction, leaving 15 different rows that need to sum to 38.

I started as anyone might start.  Manually plugging away.  After getting ‘close’ a few times and failing, I figured I could brute force this, so I opened Excel.  That tactic turned out to be laughable.  With 19! possible combinations, I needed to narrow down my search.  How many tiles did I need to solve to figure out the rest of the puzzle?  Well, after some failed attempts, it’s possible to take the 15 equations that should sum to 38 and reduce them down to 12 equations with 7 independent variables!  That means we will only need to solve 19!/(19-7)! iterations.  The process to derive these equations is called Guassian Elimination.  Huge props to hwiechers (Careful, this link contains a spoiler!) for showing me these equations — I had to take them at face value given my limitations in this area.

Once I had these equations it didn’t take much time at all to write the script to take the iterations, solve the equations, and check if everything added up.  My code can be foundhere: https://github.com/tsunamitreats/aristotlesNumberGame/blob/master/equationPerms.py

Boy am I happy to be done with this puzzle!

Cheers and have a happy New Year!

 

 

Jon Leaman

Coursera: A Survival Guide

It was the first week of the new year and in the spirit of resolutions and new beginnings, I wrote down my all my goals I could think of.  After some refining, one of my goals was to take an online class.  I had a few classes in mind, and settled with the class that had the earliest start date.  Now that my class came to a close, I want to share with you my tips on surviving and how to get the most out of it.

Iconic pencils

Iconic pencils that you won’t be needing.

The class I settled on was called ‘Computing for Data Analysis‘, offered via Coursera.org.  The reason this class was on my short list was because it applied to my professional life and my personal projects.  This class teaches you the basics of the statistical programming language ‘R’.  For those of you not familiar with R, it is basically a command line interface for spreadsheets that makes it ‘easy’ to generate graphical representations of your large data sets.  My definition is a simplified one that doesn’t do R justice, but for the laymen, I think it does the job.

Professionally, I wanted to be able to have conversations with Data Scientists.  My personal projects include manipulating and deriving value from VC/Angel and startup data, and do so in a repeatable fashion.  I wanted to be able to take data collected via APIs, excel spreadsheets, and websites to manipulate in order to answer questions.  Basically, I have multiple pools of data that I needed to sift through to answer some very specific questions for myself.  If you want more details, Jamie Davidson wrote a really cool article that tries to answer “How much venture money should a startup raise to be successful?”  He even published his R code so you can reproduce his research — cool stuff!  The point being, you need some motivation to get through this (or any) class.  If you don’t care about the content being taught, then you aren’t very likely to complete the class.  I was fortunate enough to have selected a class that was offing a “Signature Track”, which basically means that the school administering the class, John Hopkins, acknowledges your work in the class and you can add a line item to your LinkedIn page.  This wasn’t a motivator for me, but since I already resolved on completing the class, spending the $50 to earn the certificate was a no brainer.

So the class is set on a weekly schedule that goes something like this:

  • Weekly video content published.
  • Weekly quiz available.
  • Weekly programming assignment available.
  • Repeat.

Candidly, I’m not a programmer or a developer or a data scientists.  I was definitely in over my head, and very thankful to have taken the Python course offered by Codeacademy prior to this course, so I knew a little about syntax and how to ‘talk to a computer’.  The 3-5 hours estimated per week was very low for my learning curve.  I needed to get up to speed and fast.

Without a doubt, the most difficult part of programming is knowing how to ask your question.  Too many questions are asked with hidden assumptions and misaligned expectations.  For this reason, the class forum can be a very noisy place and often, if you know how to ask your question correctly, not the best resource to help you get your answer.  That’s why official documentation and established forums are my go-to resource for specific questions.  The forum is great because it is moderated by class TAs that can help answer theoretical questions or help you rethink your problem to get you to the right question.  Lastly, if and only if you know what you don’t know, you can leverage the IRC chat rooms.

So if you have a desire and the right resources available to you, I think you’ll be fine.  The last thing I want to part on you is networking.  If I went to Grad School, a big reason would be for the network.  I want to meet people with a passion for success who are on their way up.  The same holds true for online courses.  I took it upon myself to start the ‘Computing for Data Science – Class of 1/2014’ group on LinkedIn.  If your class is small (under ~50 students), you could just share your LinkedIn credentials, but trying to connect with 70k other students is a big data problem in and of itself…  So if you are taking a course, setup a place for your class to network with each other after the class is over.

I would recommend ‘Computing for Data Analysis’ for anyone interested in data manipulation, data science, or big data.  It is a Signature Track and also gives you credit toward Coursera’s ‘Data Science Specialization,’ which could be a differentiator on a resume.

 

Jon Leaman

NSA and the US Cloud Industry

Following the news around the NSA can be depressing.  Speculating on the future often leads down a pessimistic hole.  I want to share the light at the end of the tunnel I have seen.

As you may be able to tell from my intro, I’ve gone through a range of emotions regarding the NSA and all the surrounding news.  I first started caring about internet privacy and openness with the SOPA and PIPA acts.  I grew up with the internet and saw the SOPA and PIPA acts as a way for Hollywood to do to the internet what the FCC did to radio and television.  The reason that this battle was unprecedented was because the restrictions set on radio and television had a technical argument due to contention of airwaves.  There is no technical contention for resources on the internet like there was with radio and TV.  Thankfully, these bills have been stopped in congress, but this is an ongoing battle.

My internet activist fire was lit and many petitions against internet restriction bills had already been signed when the leaks started coming out around the NSA.  There was a lot of suspicion around the extent of the mass surveillance, but no one knew the extent of the reach.  The biggest news, in my opinion, to come out of the leaks is that the NSA was actively looking to undermine encryption standards with backdoors and that the US government was willing to shutdown US based companies that offered truly secure communications as-a-Service (see the tragic end of Lavabit and Silent Circle, two young companies with great promise).  I was emotional about this, but I struggled to find a pragmatic footing.

While I don’t agree or condone the NSA’s actions, there is a conservative argument to be made for the NSA’s programs to be in place.  After all, we are a world leader and we should look for advantages to stay relevant in the economic stage.  Not to mention the fact that private companies are selling mass surveillance to private companies and governments all around the world.  It wouldn’t be fair if the NSA didn’t monitor its citizens!

As individuals, companies, and non-US governments; how should we proceed in a world like this?  Well, there are two major problems to be solved.  One is that if privacy is going to be a concern, full-stack open source encryption needs to be more easily available to the masses.  Encryption is still mathematically proven to be secure.  If open source encryption is implemented with care (open source from the ground up), it makes snooping infeasible.  The second issue is trust with US service providers.  The PRISM program alone caused damage in the range of $35-180 billion to the US Public Cloud industry over the next three years.  Currently, to address the root of the problem, the US would need to reel in the NSA’s jurisdiction over private companies’ data.  There has been progress made in the past 24 hours, but in the event that that doesn’t happen, there is a workaround.  The workaround is for companies to expand their product offerings from public cloud services to onsite private cloud deployments.  Salesforce, as an example, would need to build out a private cloud offering for companies who aren’t comfortable letting their data move outside their data center’s walls (or host country’s borders).  For Salesforce and their ‘No Software’ slogan, this is less than ideal.

So encryption and changes to existing services are a nice first step to gaining trust back in cloud service offerings, but I think the real transition will happen when public service providers decouple their services from the infrastructure where the data lies.  I think we will see a shift of cloud service providers becoming cloud infrastructure agnostic.  There is a subtle but important difference between the workaround suggested above (sell public services into private clouds) and the decoupling of public cloud services from the infrastructure it runs on.  Using Salesforce again, if they decoupled their service from their infrastructure they could operate just like they do today, except there would be a setting to configure where the storage is coming from (e.g. a specific Salesforce DC, a private cloud, AWS, etc..).

I have a feeling that this type of offering will become more prevalent as US companies continue to compete on the world stage.  How do you see the IT industry reacting to these leaks?

Jon Leaman

Nimble and the Hybrid/Flash Array Market.

First and foremost, I would like to welcome Nimble to the public market and thank them.  I attribute their successful entrance with the bump in the entire tech stock sphere and the increased awareness of the hybrid/flash array (if anyone hasn’t heard about these yet…).  Secondly, I think competition is a good thing, especially in an emerging market.  It keeps everyone honest and promotes innovation.

From a high level, Nimble has many things going for them.  They have had a product in the market for a while now, a (reportedly) growing list of customers, and a successful IPO in a red-hot market.  That said, they have many challenges they still need to face.  In my opinion, their IPO is a symptom of their biggest challenge — an exit strategy.  It’s a loaded comment, I know, but I believe the most viable option for a all-flash start-up is by acquisition.  But even if they wanted to be acquired, all the giants already placed their bets on other startups, and some even made the decision to develop their own technology instead of buying a technology.  Nimble had to go public.  Whether their hand was forced or they did so on their own free will, their IPO was a success.  Time will tell if they can convert their recent wins into a viable long-term business.  Their first test will be with their quarterly results (their numbers close Jan, 31), but their long-term viability will be evident in their strategic vision and an ability to execute against that.

How do I really feel?  There is a war underway flash space, and it’s already begun.  Here are some highlights from the second half of this year: IBM acquires Texas Memory, Violin Memory goes public, EMC’s XtremIO becomes generally available, and Nimble goes public.  Without a doubt, there is a lot of hype in this space, there are no clear winners yet, and there are real world use-cases and workloads that are demanding this technology.   Will Nimble be able to play with the big boys?  Ultimately, time will tell, but I’ll remain skeptical for now.

Jon Leaman

EMC’s ScaleIO vs VMWare’s VSAN

VSAN is a hot topic today! Without getting into the weeds, I want to briefly describe the similarities between these two technologies and how they can help enable your IT department followed by their differences.

Put simply, EMC products will be hypervisor agnostic and VMWare products will be storage agnostic. Ask anyone (including analysts, but probably not NetApp), and they’ll say that EMC and VMWare products work best hand-in-hand, but they are free to move about each others competitors. While this strategy can be a pain in the side of EMC and VMWare, it is ultimately a great source of power. This relationship holds true for ScaleIO and VSAN

Both of these technologies essentially do the same thing — build a virtual SAN. By using the storage presented to a group of hosts, a virtual SAN is created and then shared within a cluster. ScaleIO is hypervisor agnostic, and can even support physical servers. VSAN, on the other hand, is ESX specific. And the fact that these technologies are DAS, VSAN being ‘storage agnostic’ isn’t much of an advantage in this case. You could say VSAN is like ScaleIO if it was only licensed for VMWare…

There are other differences. Both technologies enable hybrid storage (e.g. a combination of spinning disk and magnetic) in different ways. They architecture is different (VSAN code is in the kernal where ScaleIO needs it’s own virtual machine). ScaleIO can scale larger today. And the list goes on.

Candidly, I believe ScaleIO is a better product because users of this product will have a lots of options in deployment and supported infrastructure. That said, if you are a VMWare only shop and you aren’t plagued with excessive scale, then you already have this code on your ESX clusters, you just need to license it.