Postcard from Another Dimension

My GPU took 45 minutes to generate these beautiful people from nothing. Well, not nothing. It was a Generative Adversarial Network. What’s that you ask? It’s an architecture for a neural network. This architecture was developed by NVidia, where they showed how, with just a bit more hardware, you can generate photo-realistic people that never existed.

Pretty neat. I’ll fool around with this a bit to see if I can optimize it, but I’m also looking forward to testing it on other media.

Huge shout out to Jeff Heaton for the YouTube tutorial! I’m looking forward to trying this out on other data sets.

A Few CloudTrail Best Practices

There are many tools out there to help monitor and alert on AWS accounts, both native and third-party. Across every tool I have tested, one alert is always critical and it’s so easy to fix that there is no excuse not to have it on — CloudTrail.

CloudTrail, in short, logs every API call in your AWS account. The importance of this should be self-evident to my readers, so bear with me. If something bad happens in your account, you want to know who did it and when. You want an audit trail! Without CloudTrail, you’ll be flying without a rear-view mirror. You won’t have any hindsight to be 20-20 about, and that’s a shame.

At what cost? Basically the cost of storing the data in S3 and CloudWatch — minimal. If you’re not sold and you’ve read this, I’ll have no sympathy.

Moving on.

Historically, CloudTrail was enabled per region. This means that when a new region comes online, you must remember to go and enable it in that region. If you don’t automate this, there is room for neglect. CloudTrail now has an ‘all-regions’ setting per trail. My recommendation is create a new trail that has all-regions enabled. If this is as far as you go, that’s okay, but we can take it a step further.

At this point, you’ll have an S3 bucket that is logging all API calls across all regions, and in the event a new region comes online, that region too. Additionally, you can pipe these logs through CloudWatch, which I recommend. Typically, most customers only use 1-3 regions, so if you have non-readonly activity in any of the other regions, you probably want to be alerted. I’m going to walk you through setting up alerts to answer the following question:

How can I be alerted of any activity in non-approved regions, with the exception of read-only calls?

Continue reading

Lambda 101 – Serverless Business Logic

I’ll keep this post short and let the video do the talking. This twelve minute video will walk through three different Lambda examples and investigate the payloads of each. The goal is to get developers and system administrators comfortable with using Lambda to execute business logic!

See below for more details. Enjoy!

Here are some additional references:

Function code: http://pastebin.com/mcAxrkdw

1) Jeff Barr’s Blog @ AWS is a good source for new announcements, interesting use cases, and much more: https://aws.amazon.com/blogs/aws/category/aws-lambda/

2) CloudSploit’s write up on how they made their whole company serverless with some insights on the savings they’ve seen:
https://medium.com/@CloudSploit/we-made-the-whole-company-serverless-5a91c27cd8c4

3) A deep dive into developing a serverless application and many of the considerations that need to be made. Written by Mike Watters (https://github.com/zerth): http://tech.adroll.com/blog/dev/2015/11/16/count-things-with-aws-lambda-python-and-dynamodb.html

4) Working with serverless applications is great, but how to you manage such an application over the lifecycle of the app? Michael Wittig (https://twitter.com/hellomichibye) answers this question on his blog: https://cloudonaut.io/the-life-of-a-serverless-microservice-on-aws/

5) Could it get any easier!? The innovation has just begun! Check out AWS’s Python Serverless Microframework: https://aws.amazon.com/blogs/developer/preview-the-python-serverless-microframework-for-aws/

I hope you enjoyed. If you have feedback or questions, leave them here!

Standing on the Edge of the Unknown

Tomorrow, I start working for Amazon Web Services!

In order to put my excitement in context, it’s important to know where I’m coming from and where I’m looking to go.

Five years ago, I started working for EMC.  EMC took a risk hiring me, but it paid off.  I was new to the workforce with no experience in storage and completely naive to the trends that shape the IT landscape.  I didn’t even know what they were asking me to do.  What does a pre-sales engineer do, exactly?  During my five years, EMC gave me access to all the resources I needed to be successful.  When I finally figured out how to do my job well, I had a new perspective on career and the IT landscape as a whole.  I will never forget the people at EMC who helped me along the way.

Now that I saw things in a new light, I started asking myself how I could do it again.  This lead to a multi-year search that ended two weeks ago with Amazon.  My criteria was as follows:

  • I must be able to provide value day one.  I’m good at understanding complex technologies, mapping the value of the technology to business needs, and message this value to different stakeholders.
  • They must have a sound strategy.  I was searching for a company that has the potential to be a market leader (or is one), and having a poor strategy won’t get you there.
  •  Located in Boston.

I profiled hundreds of companies and only four made it through my filter.  I didn’t obsess over my search, but I was always looking.  If anyone ever mentioned a company I hadn’t heard of, the next thing I would do would investigate them.  My notebook is filled with dozens of companies that didn’t make the cut.

Then I got the call.  Amazon wanted to talk to me?  I didn’t even have a warm introduction!  This was like MIT or Harvard approaching me to go to their school — I just couldn’t believe it.  I’m still having a hard time believing it.

There is always a degree of uncertainty that comes with changing roles.  I won’t have a support network and I know I don’t know a lot of things that I need to know to be successful at Amazon.  But I trust in myself to build my network within Amazon, educate myself on the gaps I have with the technology, make some friends along the way, and have fun doing it!

With every new piece of information I get my hands on, I am more certain that this is the right decision for me.  Have you read anything that Jeff Bezos has said?  Have you seen the new drone video Amazon just released?  Are you aware of just how many web services AWS offers?  AAHHHH, YES!!!!

Blue Origin employees celebrating a rocket land.

Blue Origin employees celebrating a rocket land.  This happened after I accepted the offer! (source: https://www.youtube.com/watch?v=igEWYbnoHc4)

Today I stand tall.  The energy is surging through my body!  I’m off to memorize Amazon’s Leadership Principles before my first day.  I’ll let you know how it goes!

The Cloud Landscape – Do I need a strategy?

Let’s take a minute to reflect on the IT industry.  Heraclitus had it right: “The only thing that is constant is change.”  We’ve all seen the disruptive waves of technology sweep through the IT world. With every wave there are winners and losers. Speaking directly to the compute environment, there have been mainframes, microprocessors, open systems, virtualization, and now cloud computing.
“I think there is a world market for maybe five computers.” — Thomas Watson, chairman of IBM, 1943. (The irony here is that IBM is the only company to survive every disruptive wave to date).

Continue reading

How I solved Aristotle’s Puzzle in 1 Hour (no spoilers)

I was given Aristotle’s Number game as a gift this Christmas.  The difficulty on the back read “Nightmarish”, but I wasn’t scared.  That said, my title may be misleading.  It took me 3-4 days to figure out all 12 solutions, but my script only took 1 hour and 3 minutes to run through the 250m permutations I was able to break it down into.  How this puzzle is solved without a computer is mind boggling.  gameBoard

The rules are simple.  You have 19 hex pieces that have a number 1-19 on them.  They need to be placed on the board, seen above, so that all rows sum to 38.  That’s in any direction, leaving 15 different rows that need to sum to 38.

I started as anyone might start.  Manually plugging away.  After getting ‘close’ a few times and failing, I figured I could brute force this, so I opened Excel.  That tactic turned out to be laughable.  With 19! possible combinations, I needed to narrow down my search.  How many tiles did I need to solve to figure out the rest of the puzzle?  Well, after some failed attempts, it’s possible to take the 15 equations that should sum to 38 and reduce them down to 12 equations with 7 independent variables!  That means we will only need to solve 19!/(19-7)! iterations.  The process to derive these equations is called Guassian Elimination.  Huge props to hwiechers (Careful, this link contains a spoiler!) for showing me these equations — I had to take them at face value given my limitations in this area.

Once I had these equations it didn’t take much time at all to write the script to take the iterations, solve the equations, and check if everything added up.  My code can be foundhere: https://github.com/tsunamitreats/aristotlesNumberGame/blob/master/equationPerms.py

Boy am I happy to be done with this puzzle!

Cheers and have a happy New Year!

 

 

Coursera: A Survival Guide

It was the first week of the new year and in the spirit of resolutions and new beginnings, I wrote down my all my goals I could think of.  After some refining, one of my goals was to take an online class.  I had a few classes in mind, and settled with the class that had the earliest start date.  Now that my class came to a close, I want to share with you my tips on surviving and how to get the most out of it.

Iconic pencils

Iconic pencils that you won’t be needing.

The class I settled on was called ‘Computing for Data Analysis‘, offered via Coursera.org.  The reason this class was on my short list was because it applied to my professional life and my personal projects.  This class teaches you the basics of the statistical programming language ‘R’.  For those of you not familiar with R, it is basically a command line interface for spreadsheets that makes it ‘easy’ to generate graphical representations of your large data sets.  My definition is a simplified one that doesn’t do R justice, but for the laymen, I think it does the job.

Professionally, I wanted to be able to have conversations with Data Scientists.  My personal projects include manipulating and deriving value from VC/Angel and startup data, and do so in a repeatable fashion.  I wanted to be able to take data collected via APIs, excel spreadsheets, and websites to manipulate in order to answer questions.  Basically, I have multiple pools of data that I needed to sift through to answer some very specific questions for myself.  If you want more details, Jamie Davidson wrote a really cool article that tries to answer “How much venture money should a startup raise to be successful?”  He even published his R code so you can reproduce his research — cool stuff!  The point being, you need some motivation to get through this (or any) class.  If you don’t care about the content being taught, then you aren’t very likely to complete the class.  I was fortunate enough to have selected a class that was offing a “Signature Track”, which basically means that the school administering the class, John Hopkins, acknowledges your work in the class and you can add a line item to your LinkedIn page.  This wasn’t a motivator for me, but since I already resolved on completing the class, spending the $50 to earn the certificate was a no brainer.

So the class is set on a weekly schedule that goes something like this:

  • Weekly video content published.
  • Weekly quiz available.
  • Weekly programming assignment available.
  • Repeat.

Candidly, I’m not a programmer or a developer or a data scientists.  I was definitely in over my head, and very thankful to have taken the Python course offered by Codeacademy prior to this course, so I knew a little about syntax and how to ‘talk to a computer’.  The 3-5 hours estimated per week was very low for my learning curve.  I needed to get up to speed and fast.

Without a doubt, the most difficult part of programming is knowing how to ask your question.  Too many questions are asked with hidden assumptions and misaligned expectations.  For this reason, the class forum can be a very noisy place and often, if you know how to ask your question correctly, not the best resource to help you get your answer.  That’s why official documentation and established forums are my go-to resource for specific questions.  The forum is great because it is moderated by class TAs that can help answer theoretical questions or help you rethink your problem to get you to the right question.  Lastly, if and only if you know what you don’t know, you can leverage the IRC chat rooms.

So if you have a desire and the right resources available to you, I think you’ll be fine.  The last thing I want to part on you is networking.  If I went to Grad School, a big reason would be for the network.  I want to meet people with a passion for success who are on their way up.  The same holds true for online courses.  I took it upon myself to start the ‘Computing for Data Science – Class of 1/2014’ group on LinkedIn.  If your class is small (under ~50 students), you could just share your LinkedIn credentials, but trying to connect with 70k other students is a big data problem in and of itself…  So if you are taking a course, setup a place for your class to network with each other after the class is over.

I would recommend ‘Computing for Data Analysis’ for anyone interested in data manipulation, data science, or big data.  It is a Signature Track and also gives you credit toward Coursera’s ‘Data Science Specialization,’ which could be a differentiator on a resume.

 

Nimble and the Hybrid/Flash Array Market.

First and foremost, I would like to welcome Nimble to the public market and thank them.  I attribute their successful entrance with the bump in the entire tech stock sphere and the increased awareness of the hybrid/flash array (if anyone hasn’t heard about these yet…).  Secondly, I think competition is a good thing, especially in an emerging market.  It keeps everyone honest and promotes innovation.

From a high level, Nimble has many things going for them.  They have had a product in the market for a while now, a (reportedly) growing list of customers, and a successful IPO in a red-hot market.  That said, they have many challenges they still need to face.  In my opinion, their IPO is a symptom of their biggest challenge — an exit strategy.  It’s a loaded comment, I know, but I believe the most viable option for a all-flash start-up is by acquisition.  But even if they wanted to be acquired, all the giants already placed their bets on other startups, and some even made the decision to develop their own technology instead of buying a technology.  Nimble had to go public.  Whether their hand was forced or they did so on their own free will, their IPO was a success.  Time will tell if they can convert their recent wins into a viable long-term business.  Their first test will be with their quarterly results (their numbers close Jan, 31), but their long-term viability will be evident in their strategic vision and an ability to execute against that.

How do I really feel?  There is a war underway flash space, and it’s already begun.  Here are some highlights from the second half of this year: IBM acquires Texas Memory, Violin Memory goes public, EMC’s XtremIO becomes generally available, and Nimble goes public.  Without a doubt, there is a lot of hype in this space, there are no clear winners yet, and there are real world use-cases and workloads that are demanding this technology.   Will Nimble be able to play with the big boys?  Ultimately, time will tell, but I’ll remain skeptical for now.

EMC’s ScaleIO vs VMWare’s VSAN

VSAN is a hot topic today! Without getting into the weeds, I want to briefly describe the similarities between these two technologies and how they can help enable your IT department followed by their differences.

Put simply, EMC products will be hypervisor agnostic and VMWare products will be storage agnostic. Ask anyone (including analysts, but probably not NetApp), and they’ll say that EMC and VMWare products work best hand-in-hand, but they are free to move about each others competitors. While this strategy can be a pain in the side of EMC and VMWare, it is ultimately a great source of power. This relationship holds true for ScaleIO and VSAN

Both of these technologies essentially do the same thing — build a virtual SAN. By using the storage presented to a group of hosts, a virtual SAN is created and then shared within a cluster. ScaleIO is hypervisor agnostic, and can even support physical servers. VSAN, on the other hand, is ESX specific. And the fact that these technologies are DAS, VSAN being ‘storage agnostic’ isn’t much of an advantage in this case. You could say VSAN is like ScaleIO if it was only licensed for VMWare…

There are other differences. Both technologies enable hybrid storage (e.g. a combination of spinning disk and magnetic) in different ways. They architecture is different (VSAN code is in the kernal where ScaleIO needs it’s own virtual machine). ScaleIO can scale larger today. And the list goes on.

Candidly, I believe ScaleIO is a better product because users of this product will have a lots of options in deployment and supported infrastructure. That said, if you are a VMWare only shop and you aren’t plagued with excessive scale, then you already have this code on your ESX clusters, you just need to license it.