Feature Detection is Real, and it just Found Flesh – Eating Butterflies

What’s All the Fuss About Features?

Features are interesting aspects or attributes of a thing. When we read a feature story, it’s what the news room feels will be the most interesting, compelling story that draws in viewers. Similarly, when we look at a picture, or a Youtube thumbnail, various aspects of that photo or video tend to draw us in. Over the years (thousands of years), humans have gotten pretty good at picking up visual cues. Our ancestors had to stay away from danger, protect their caves from enemies, detect good and bad intentions from the quiver of a lip, and read all sorts of body language form gestures and dances.

Nowadays, it’s not much different, except that we’re teaching computers to pick up on some of the same cues. In computer vision, features are attributes within an image or video that we’d like to isolate as important. A feature could be the mouth, nose, ears, legs, or feet of a face, the corners of a portrait, the roof of a house, or the cap of a bottle.

It is an interesting area, and a grayscale pixel that makes up just a tiny portion of an entire photograph won’t tell us much. Instead, a collection of these pixels within a given area of interest is what we’re after. If an image can be processed, then certainly we can isolate areas of that photo for further inspection, and match it with like or exact objects – and that is what we’re going to explore.

Types of Detection

We all should know the power that OpenCV brings to the table, and it does not fall short with its methods of feature detection. There is Harris corner detection, Shi-Tomasi corner detection, Scale-Invariant Feature transform (SIFT), and Speeded-Up Robust Features (SURF), to name a few. Harris and Shi-Tomasi both have different ways of detecting corners, and using one over the other comes down mostly to personal preference. Use them and find as many boxes and portraits in images and videos as you like, but we’re looking for the big power brokers. We’re gonna use SIFT in this example, and SURF works great too, but not today my friends.

Both SURF and SIFT work by detecting points of interest, and then forming a descriptor of said points. If you’d like the technical explanation of SIFT, take a look at the source. The explanation goes into depth about how this type of feature detection and matching has enough robustness to handle changing light conditions, orientations, angles, etc. Pretty, pretty…pret-tyyy good stuff.


The basic workflow we’ll be using is to take an image, automatically detect features which would make this object unique from similar objects, attempt to describe those features, and then compare those unique features (if any are found) with another image or video that contains the original image/video, perhaps in a group to make it more challenging. Imagine, if we keyed a particular make and model of a car, setup a camera, and waited to see if and when that car showed up in front of our house again. You could setup a network of cameras to look for a missing person, or key an image of a lost bike for cameras around a college campus. Just make sure you do so within the laws of your jurisdiction, of course.

The bulk of the work for matching keyed features is handled by a version of the unsupervised clustering classifier, the k-Nearest Neighbor (kNN) algorithm. In our example, we train a kNN model to detect descriptors from our original training image, and then use a query set to see if we find matches.

Code Exploration and Download Resources

Here’s our original image to start with. That butterfly is known as the Purple Emperor, and it is beautiful, oh yes. And it also feeds on rotting flesh. Be an admirer when you’re combing the British countryside, but not too close. Full code and resources can be downloaded here.

Use Python 3 if you can, an installation of OpenCV (3+ if possible), and the usual Numpy and Matplotlib. It may be necessary to install along with OpenCV, opencv-contrib. In my case using Python 3+, I had installed OpenCV though Homebrew on a Mac/Linux install, and had to find an alternative way to invoke the SIFT command:

import cv2
import matplotlib.pyplot as plt
import numpy as np
n_kp = 100 #limit the number of possible matches
# Initiate SIFT detector
#sift = cv2.SIFT() #in Python 2.7 or earlier?
sift = cv2.xfeatures2d.SIFT_create(n_kp)

If you still have trouble getting your interpreter to recognize SIFT, try using the Python command line or terminal, and invoking this function:


Then exit and run these lines to see if everything checks out:

>>import cv2
>>>image = cv2.imread("any_test_image.jpg")
>>>sift = cv2.xfeatures2d.SIFT_create()
>>>(kps, descs) = sift.detectAndCompute(gray, None)
>>>print("# kps: {}, descriptors: {}".format(len(kps), des

If you get a response, and not an error message, you’re all set. I’ve set the number of possible feature matches to 100 with the variable n_kp , if only to make our final rendition more visually pleasing. Try it without this parameter – it’s ugly, but gives you a sense of all the features that match; some are more accurate than others.

img1 = cv2.imread('butterfly_orig.png',0) # queryImage
img2 = cv2.imread('butterflies_all.png',0) # trainImage
# find the keypoints and descriptors with SIFT
keyp1, desc1 = sift.detectAndCompute(img1,None)
keyp2, desc2 = sift.detectAndCompute(img2,None)
src_params = dict(checks = 50)
idx_params = dict(algorithm = FLANN_INDEX_KDTREE, trees =

With our training and test images set, we send SIFT off of detect features. We set MIN_MATCHES to 10, meaning that we’ll need at least 10 of the 100 max possible matches to be detected for us to accept them as identifiable features. Making use of the Fast Library for Approximate Nearest Neighbors (FLANN) classifier, we now want to actually search for recognizable patterns between our original training set and our target image. First, we’ve set up index idx_params , and search src_params , and we’ll run the method FlannBasedMatcher to perform a quick search to determine matches.

flann = cv2.FlannBasedMatcher(idx_params, src_params)
matches = flann.knnMatch(desc1,desc2,k=2)
# only good matches using Lowe's ratio test
good_matches = []
for m,n in matches:
if m.distance < 0.7*n.distance:
#good_matches = filter(lambda x: x[0].distance<0.7

We’ve also saved to matches , a run of descriptors from both images to see if they match. From flann.knnMatch will come up with a list of commonalities between the two sets of descriptors. Keep in mind that the more matches that are found between the training and query (target image) set, the more likely that our training pattern has been found in our target image. Of course, not all features will accurately line up. We used k=2 for our k parameter, and that means the algorithm is searching for the two closest descriptors for each match.

Invariably, one of these two matches will be further away from the correct match, so to filter out the worst matches, and filter in the best ones, we’ve setup a list and a loop to catch the good_matches . By using the Lowe’s ratio test, a good match comes along when ratio of distances between the first and second match is less than a certain number – in this case 0.7.

We’ve now found our best-matching keypoints, if there are any, and now we have to iterate over them, and do fun stuff like draw circles and lines between key points so we won’t be confused at what we’re looking at.

if len(good_matches)>MIN_MATCHES:
src_pts = np.float32([ keyp1[m.queryIdx].pt for m in g
ood_matches ]).reshape(-1,1,2)
train_pts = np.float32([ keyp2[m.trainIdx].pt for m in
good_matches ]).reshape(-1,1,2)
M, mask = cv2.findHomography(src_pts, train_pts, cv2.R
matchesMask = mask.ravel().tolist()
h,w = img1.shape
pts = np.float32([ [0,0],[0,h-1],[w-1,h-1],[w-1,0] ]).
dst = cv2.perspectiveTransform(pts,M)
img2 = cv2.polylines(img2,[np.int32(dst)],True,255,3,
print("Not enough matches have been found - {}/{}".for
matchesMask = None
draw_params = dict(matchColor = (0,0,255), singlePointColo
r = None,matchesMask = matchesMask,flags = 2)
img3 = cv2.drawMatches(img1,keyp1,img2,keyp2,good_matches,
plt.imshow(img3, 'gray'),plt.show()

We store src_pts and train_pts , where each match, m , is a store of the index of keypoint lists. m.queryIdx refers to the index of query key points, and m.trainIdx refers to the index of the training key points in the keyp2 list. So, our lists and matching key points are saved, but what’s this cv2.findHomography ? This method makes our matching more robust, by finding the homography transformation matrix between the feature points. Thus, if our target image is distorted or has a different perspective transformation from the camera or otherwise, we can bring the desired feature points into the right plane as our training image.

RANSAC stands for Random Sample Consensus, and it involves some heavy lifting whereby the training image is used to determine where those same, matching features might be in the new image that may have been twisted, tilted, warped, or otherwise transformed. The best matches after any transforms are known as inliers, and those that didn’t make the cut are called outliers. Again, if you take a moment to think about that, the power of feature detection after significant transforms is pretty interesting…maybe a bit too interesting.

We then draw lines to match the original image key points with the query transforms, and the output might look something like this:

I’m calling this the butterfly effect.


It’s not as if you needed another example, but OpenCV’s computer vision is a powerful tool. We grabbed an image, classified its unique features, then partially obstructed that image in a busier query image, only to find that our feature detection was so strong that it had no problem whatsoever finding and matching the features and descriptors from our target image. The applications of this technology are being implemented today, and if you explore it now, you’ll be well on your way to creating something for tomorrow.

Facebook Might be Spying on Us, but it Makes for Pretty Graphs

Graphs, Graph theory, Euler, and Dijkstra

As tasks become more defined, the structures of data used to define them increase in complexity. Even the smallest of projects can be broken down into groups of smaller tasks, that represent even smaller sub-tasks. Graphs are a data structure that helps us deal with large amounts of complex interactions, in a static, logical way. They are widely used in all kinds of optimization problems, including network traffic and routing, maps, game theory, and decision making. Whether we know it or not, most of us have had experiences with graphs when we interact with social media. It turns out that graphs are the perfect data structure to describe, analyze, and keep track of many objects, along with their relationships to one another. However, despite the ubiquity of graphs, they can be quite intimidating to understand. In the interest of curiosity and science, today is the day we’re going to tackle graphs, and wrestle with that uneasy feeling we get when we don’t have the slightest clue how something in front of us works.

In order to explore graphs, we’re gonna take a look at what makes a graph, cover some of the math behind it, build a simplified graph, and begin to explore a more complex social graph from Facebook data. Nodes, or vertices of a graph are like the corners of a shape, while the edges are like the sides. Edges connect with corners in all directions, giving graphs the ability to take on any shape. Any graph can be depicted with G = (V, E) , where E is the set of edges, and V is the set of vertices. Larger graphs just have more nodes, and more edges means more connectivity.

Computers find it more convenient to depict graphs as an adjacency matrix, otherwise known as a connection matrix. An adjacency matrix is made up of a square matrix that consists of only 0’s and 1’s (binary). In this binary matrix, a 1 represents a spot in the graph were an edge goes from vertex to vertex. If there is no edge running between, say vertex i and vertex j, there will be a 0 in the matrix. A good bit of graph theory can be attributed to 18th century Swiss mathematician Leonhard Euler. Euler is known as one of the most influential and prolific mathematicians of all time, with contributions in the fields of physics, number theory, and graph theory. His work on the famous Seven Bridges of Königsberg problem, where one had to decide if they could cross each bridge only once in a round-trip back to the starting point, resulted in a number of revelations concerning graph theory. Among those revelations was the discovery of the formula V-E+F=2 , having to do with the number of vertices, edges, and faces of a convex polyhedron.

Weighted, Directed, and Undirected Graphs

Weighted graphs are those that have a value associated with an edge. The weight of each edge corresponds to some cost relationship between its nodes. This cost could be distance, power, or any other relationship that relates an edge to a node. The only difference between this and an unweighted graph, is that a weighted adjacency list includes an extra field for the cost of each edge in the graph.

A directed graph is a set of objects where all the edges have a direction associated with them. You could think of most social networks as directed graphs, because direction matters when you consider the terms followers and following. Kim Kardashian certainly doesn’t follow all of her followers, rather, her 140-plus million edges are directed towards her node in a way that makes her quite influential. We’ll take a look to explore this kind of network influence a bit later when we build a graph.


Edsger Dijkstra was a Dutch systems scientist, who published Dijkstra’s Shortest Path First Algorithm (SPF) in 1956. The algorithm finds the shortest paths from the source (origin) node to all other nodes. Simplified, the algorithm works under these rules:

  • For each new node that is visited, choose the node with the smallest known distance/cost to visit first.
  • Once at the newest node, check each of its neighboring nodes.
  • For each neighboring node, calculate the cost of the neighboring nodes by summing the cost of all the edges, from the starting vertex, which lead to this node.
  • If the cost to this node is less than a known (labeled) distance, this is the new shortest distance to this vertex.
  • This loop continues to run through all nodes until our algorithm is done running.

Basically, this is a find and sort algorithm, where we are searching for nearby nodes, labeling them as found and measured, or found and not measured.

A Map of Manhattan

A while back, I visited some family in Manhattan. Most days I was there, I ended my trip on the Lexington Avenue line at the 125th St. station. As I walked (through the cold) from my source to wait for my train, I traversed a series of forgetful left and right turns, covering a jagged path, where the total distance was the absolute distance between turns, or Manhattan distance. Once I was underground in the subway, the train took mostly a straight line path, and that’s Euclidean distance; also known as the distance a bird flies. One weekend we decided to visit some tourist-y spots, and as we were deciding which places to visit on the map, and in which order, it looked something like this:

With this graph, the edges between points represent distances. If we wanted to minimize the cost from Chelsea Market © to the New York Stock Exchance (S), we could find the shortest path to S. In reality, we would want to visit all locations, but in this example we’re simply going for the absolute shortest route possible. Of course, that’s another good question: which route order signifies the shortest distance if all five destinations are desired? I may or may not leave that for you to explore on your own.

Code Exploration

All you need to have installed to explore graphs in this example is Python (preferably 3+), Matplotlib, and NetworkX. Instructions on how to properly install and get started with NetworkX can be found from their documentation. Later, we’ll download some social network data as a groundwork for analyzing much more complex graph networks. If you’d like to follow along in an interactive coding environment without having to install everything locally, the full code can be found in this IPython/Jupyter environment.

Soon, you might be surprised at how simple it is to create graph representations of many real-world objects. To start, let’s initialize a graph object, and add nodes and weighted edges to it:

import networkx as nx
import matplotlib.pyplot as plt
G = nx.Graph()
G.add_edge('S', 'F', weight=2)
G.add_edge('F', 'W', weight=1.2)
G.add_edge('F', 'P', weight=1.5)
G.add_edge('P', 'C', weight=0.8)
G.add_edge('P', 'W', weight=1.1)
G.add_edge('W', 'C', weight=0.4)

Now, we draw the graph and label the edges:

pos = nx.spring_layout(G, scale=3)
nx.draw(G, pos,with_labels=True, font_weight='bold')
edge_labels = nx.get_edge_attributes(G,'r')
nx.draw_networkx_edge_labels(G, pos, labels = edge_labels)
all = (nx.nx.floyd_warshall(G))

Spoiler: So I’ve already given you an idea of determining the distances from any point from the source, by using the floyd_warshall method. The returned object is a dictionary with nodes as keys, and distances as edges from the source node. You should notice that this would only solve part of our issue if we want to actually trace the path that a traveler might take if they wanted to traverse the whole route. Instead, it gives us the distance of each point from the source, not the distance between each point. Let’s keep going.

Take a look at nx.nx.spring_layout(G) . We’ve seen this before when we were setting up and drawing our graph, but we saved it in a variable, so it bears explanation. As you can see, the returned object is a dictionary of positions keyed by node. Aha! This would be the key to finding the relative positions of the nodes on a Cartesian coordinate plane. As we look back, we can see that we did in fact save these positions to the variable pos before we drew the graph. If you comment out the position step, or neglect the pos parameter in the drawing step, you’d find that the node positions would be random instead of fixed points. Effectively, just a group of connected points floating around in space, but not here; we have fixed nodes.

With the shortest_path method, we have the Dijkstra-derived algorithm tell us the final order of the shortest-first search winner, from node C to node S. You could change these parameters to come up with an alternate route if you were so inclined. If that’s not enough, we print out the length of this path, which all adds up when you do the arithmetic.

And now we play around a bit with some other functions to get more familiar with graph networks. In terms of the degree of ‘connectedness’ that each node has, you’ll use degree . That’s just going to tell us how many edges are coming out of a node. As for clustering, it is defined as:

The local clustering of each node in G is the fraction of triangles that actually exist over all possible triangles in its neighborhood. (source)

Essentially, how many nodes occupy the immediate space relative to other clusters. When you’re exploring power and influence in a network, you might look at centrality. Eigenvector_centrality gives an indication of not only how connected a node is, but how important those incoming connections are. P and W seem to be the most powerful nodes in our little network. Yet another network measure is betweenness_centrality , that tries to gauge those nodes that help form connections from distant nodes. In our example, it comes as no surprise that node F holds the throne in betweenness, effectively bridging the gap between Greenwich Village, and downtown Lower Manhattan.

Now it makes more sense why location bears so much importance in real estate, business, and other arenas. If you lack visibility within a network (city), it might be hard to turn an isolated node into a node that has high betweenness or centrality. On the other hand, you can see why office parks, malls, and strip malls can do wonders for businesses; think about those kiosks you see in airports, or vendor booths at special events.

Facebook Data

Facebook means many things to many people, but one thing that cannot be argued is the vast amount of data that can be found there. If you’re looking for it, you can most certainly find it, and Stanford has cleaned up some social data for us to use. You will need to download the zip file labeled facebook_combined. When you run the code in the notebook, and properly upload your downloaded file (it gets erased on each instance) it should look something like this:

Wow – Take a deep dive into that with some of the methods we just learned!

Calculating the Size of Objects in Photos with Computer Vision

Table of Contents

  • Overview
  • Setup
    • Windows
    • Linux
    • OSX
  • OpenCV Basics
  • Getting Started
  • Takeaways


You might have wondered how it is that your favorite social networking application can recognize you and your friends’ faces when you tag them in a photo. Maybe like Harry Potter, a mysterious letter arrived at your doorstep on your birthday; only this letter wasn’t an invitation to the esteemed Hogwarts Academy of Wizardry and Witchcraft, it was a picture of your license plate as you sped through an intersection. A fast- growing segment of artificial intelligence known as computer vision is responsible for both scenarios, as well as a host of other applications you will likely become familiar with in the near future.

The applications of computer vision are endless, both in utility and technical impressiveness, and if you haven’t already, it’s about time you began to witness the power that modern computing technology affords you. The time for painstakingly plodding a path through the dense mathematical forest of how exactly your phone can add funds to your bank account simply by taking a picture of your check has come and gone. Instead, let’s quickly cover only the basic yak-shaving required to spark your interest in how to get from zero to sixty, without the ticket to match.


The tool of choice to foray into how to see the world like a robot is OpenCV. This Python module is a virtual Swiss Army knife that will outfit our computers with bionic abilities. To do so however, we must first overcome setup and installation, which has become much easier than in years past. Depending on your machine and operating system, it should not take an average user with a novice to intermediate level of coding experience any more than 30 minutes, but if there are complications that your computer can’t stomach at first, be patient and in under an hour it will be worth it.


A few prerequisites to installing OpenCV are Matplotlib, SciPy, and NumPy. Downloading and installing the binary distributions of SciPy and NumPy, and Matplotlib from the source are the way to go. The installations of OpenCV change with the regularity you would expect from maintaining a large codebase, so check for the latest download instructions on the OpenCV website. Any other prerequisites that your system needs will be asked for during the setup process.


Most distributions of Linux will have NumPy preinstalled, but for the latest versions of both SciPy and NumPy, using a package manager like apt-get should be the easiest route. As far as OpenCV, the path of least resistance is to consult with the well-maintaind OpenCV Docs. This resource will walk you through the installation, as well as certain caveats and common troubleshooting gripes.


If you have OSX 10.7 and above, NumPy and SciPy should come preinstalled. All of the main sources mentioned above will cover prereqs, and as for OpenCV itself, Homebrew is the most convenient solution. If you don’t have it installed already, head over to the Homebrew (brew.sh) package manager. In most cases, once brew is installed, the instructions boil down to these basic commands: brew doctor , followed by, brew install opencv , or in error-prone cases, brew install opencv — env=std . In some instances, you may be asked by Homebrew to update a PYTHONPATH, which may involve opening the new (or existing) .bash_profile file in the text editor of your choice, and saving export PYTHONPATH=/usr/local/lib/python2.7/sitepackages:$ PYTHONPATH , or something like that there.

Patiently await the downloads and you should soon have everything installed! Check your installation by launching your Python interpreter with an import cv2 command, and there should be no error messages.

OpenCV Basics

The very basic gist behind OpenCV, along with NumPy, is that they are using multi-dimensional arrays to represent the pixels, the basic building blocks of digital images. If an image resolution is 200 by 300, it would contain 60,000 pixels, each of varying intensities along the RGB scale. You may have seen the RGB triple expressed similar to this (255,0,0) when dealing with a digital color palette from graphic design software or online image editor. Each pixel is assigned like this, and together they form an image, which can be represented as an entire matrix. This matrix of RGB tuples is what OpenCV is good at manipulating.

For this project, we’re going to examine some of the features that, when combined, can lead to some really interesting applications right out of the box. I’d like to see how accurately I can measure the size of some arbitrary objects in a photo.

Getting Started

Since my son left them on the floor, and I stepped on them, I’ve taken a picture of his Hotwheels Tesla car, and a little birdie thing. To make this experiment more straightforward, I’ve added a reference object (a penny), whose size we already know in the image as well. From this reference object and the resulting pixel_to_size ratio, we’ll determine the sizes of other objects in the image.

The basic setup is to be able to run our script from the command line by feeding it the desired image, then it finds the object or objects of interest in the image, bounds them with a rectangle, measures the width and height of the images, along with some visible guides, and displays the output, right to our screen. You may need to pip install imutils or easy_install imutils, which is a package that makes manipulating images with OpenCV and Python even more robust.

Name a file thesizer.py, and input this code:

from scipy.spatial import distance as dist
from imutils import perspective
from imutils import contours
import numpy as np
import argparse
import imutils
import cv2
# construct the argument and parse command line input
aparse = argparse.ArgumentParser()
aparse.add_argument("--image", required=True,help="image p
aparse.add_argument("--width", type=float, required=True,h
elp="width of far left object (inches)")
args = vars(aparse.parse_args())
# load the image, convert it to grayscale, and blur it a b
image = cv2.imread(args["image"])
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (7, 7), 0)

We first create a way to let the script know which image we want to use. This is done using the argparse module. We tell it that we are inputting an image, followed by our reference width. I’ve used a penny for our reference width, and Wikipedia tells me that our 16th president, Abraham Lincoln’s copper bust, measures 0.75 inches across. When we finally run the script on the command line, we’ll use this format: python thesizer.py –image hotwheel.png –width 0.75 . This argument parser is quite reusable, especially for future machine learning projects that you might encounter.

# perform edge detection + dilation + erosion to close gap
s bt edges
edge_detect = cv2.Canny(gray, 15, 100) #play w/min and max
values to finetune edges (2nd n 3rd params)
edge_detect = cv2.dilate(edge_detect, None, iterations=1)
edge_detect = cv2.erode(edge_detect, None, iterations=1)

Edge detection, dilation, and erosion are methods that will no doubt pop up on most image manipulation/computer vision tasks. A great habit to begin mastering and crafting your own projects, is to dive into the more important methods used under the hood by studying the source documentation. Edge detection can be a complex subject if you want it to be. It’s one of the building blocks of computer vision, and should raise your curiosity if you like looking under the hood to find out how things work. The OpenCV docs, while definitely having an old-school vibe, are actually pretty detailed and informative. What we’ve done with the gray variable was to turn our image grayish, helping define the edges and contours. Now, the Canny method, named after its founder John F. Canny, uses a combination of noise reduction and something called intensity gradients to determine what are continuous edges of an object, and what is probably not. If you want to see what our poor man’s Terminator sees at this point, you could just display edge_detect by adding cv2.imshow(‘Edges’,edge_detect) . It would look something like this:

If you use your imagination a bit, you can start to see how Cyberdyne Systems was able to have the T1000 identify motorcycles, leather jackets, and shotguns in the future.

# find contours in the edge map
cntours = cv2.findContours(edge_detect.copy(), cv2.RETR_EX
cntours = imutils.grab_contours(cntours)
# sort contours left-to-right
(cntours, _) = contours.sort_contours(cntours)
pixel_to_size = None
# function for finding the midpoint
def mdpt(A, B):
return ((A[0] + B[0]) * 0.5, (A[1] + B[1]) * 0.5)

The findCountours method further identifies what we would consider contours of various whole objects in our image. We sort them left-toright, starting with our reference penny. Knowing that the penny goes first, we can use our pixel_to_size ratio to find out the sizes of the other objects. We’ve just initialized the penny ratio here, and we’ll use it later. Lastly, we create a function to find the middle of the object lines that we’ll draw later, so keep that in mind.

# loop over the contours individually
for c in cntours:
if cv2.contourArea(c) < 100: #ignore/fly through cont
ours that are not big enough
# compute the rotated bounding box of the contour; sho
uld handle cv2 or cv3..
orig = image.copy()
bbox = cv2.minAreaRect(c)
bbox = cv2.cv.boxPoints(bbox) if imutils.is_cv2() else
bbox = np.array(bbox, dtype="int")
# order the contours and draw bounding box
bbox = perspective.order_points(bbox)
cv2.drawContours(orig, [bbox.astype("int")], -1, (0, 2
55, 0), 2)

Everything else in this script runs under this for loop. Our contours now define what we think to be the isolated objects within the image. Now that that’s complete, we make sure that only contours/objects that have an area larger than 100px will stay to be measured. We define bounding boxes as rectangles that will fit over  the  objects,  and  turn them into Numpy arrays. In the last step  we  draw  a  green  bounding box. Note that OpenCV reverses the order of Red, Green, and Blue, so   Blue is the first number in the tuple, followed by Green, and Red.

Basically, all that’s left is to draw our lines and bounding points, add midpoints, and measure lengths.

# loop over the original points in bbox and draw them; 5px
red dots
for (x, y) in bbox:
cv2.circle(orig, (int(x), int(y)), 5, (0, 0, 255),
# unpack the ordered bounding bbox; find midpoints
(tl, tr, br, bl) = bbox
(tltrX, tltrY) = mdpt(tl, tr)
(blbrX, blbrY) = mdpt(bl, br)
(tlblX, tlblY) = mdpt(tl, bl)
(trbrX, trbrY) = mdpt(tr, br)

Here’s where we use our midpoint function, mdpt . From our four bounding box points that enclose our object, we’re looking for half-way between each line. You see how easy it is to draw circles for our bounding box points, by using the cv2.circle() command. Without cheating, can you tell what color I’ve made them? If you guessed Blue… you’re wrong! Yep, Red – There’s that order reversal that OpenCV likes to use. Red dots, 5px big. When you run the code yourself, change some of these parameters to see how it alters what we’re drawing or how the bounding boxes might get thrown off by poor countours, etc.

# draw the mdpts on the image (blue);lines between the mdp
ts (yellow)
cv2.circle(orig, (int(tltrX), int(tltrY)), 5, (255, 0,
0), -1)
cv2.circle(orig, (int(blbrX), int(blbrY)), 5, (255, 0,
0), -1)
cv2.circle(orig, (int(tlblX), int(tlblY)), 5, (255, 0,
0), -1)
cv2.circle(orig, (int(trbrX), int(trbrY)), 5, (255, 0,
0), -1)
cv2.line(orig, (int(tltrX), int(tltrY)), (int(blbrX),
int(blbrY)),(0, 255, 255), 2)
cv2.line(orig, (int(tlblX), int(tlblY)), (int(trbrX),
int(trbrY)),(0, 255, 255), 2)
# compute the Euclidean distances between the mdpts
dA = dist.euclidean((tltrX, tltrY), (blbrX, blbrY))
dB = dist.euclidean((tlblX, tlblY), (trbrX, trbrY))

Not much going on here except drawing blue midpoints of lines, 5px big. dA and dB are a bit more interesting, because we are computing the distance between bounding box points. We did this with the euclidean() method of the dist object that we imported from the SciPy library at the start of our script.

On to the finale:

# use pixel_to_size ratio to compute object size
if pixel_to_size is None:
pixel_to_size = dB / args["width"]
distA = dA / pixel_to_size
distB = dB / pixel_to_size
# draw the object sizes on the image
cv2.putText(orig, "{:.1f}in".format(distA),
(int(tltrX - 10), int(tltrY - 10)), cv2.FONT_HERSH
EY_DUPLEX,0.55, (255, 255, 255), 2)
cv2.putText(orig, "{:.1f}in".format(distB),
(int(trbrX + 10), int(trbrY)), cv2.FONT_HERSHEY_DU
PLEX,0.55, (255, 255, 255), 2)
# show the output image
cv2.imshow("Image", orig)

Here’s where the magic happens. We can now employ our penny ratio to find the size of the other objects. All we need is to use one line divided      by our ratio, and we know how long and  wide  our  object  is.  It’s  like using a map scale to convert an inch into a mile. Now we superimpose the distance text over our original image (which is actually a copy of our original image). I’ve rounded this number to one decimal place, so that would explain why our result shows our penny as having a height and width of 0.8 inches. Rest assured skeptics, it has rounded up  from  a perfect 0.75 inches; of course, you should change the accuracy to two decimal places yourself, just to make sure. Our last two lines are commands to display our image and rotate through the drawn bounding boxes on any key press.


I told you we would dive right in. You may want to try snapping a similar photo of your own and tinkering with many of these reusable code snippets. Of particular interest are these methods, that will popup again and again in your future computer vision projects:

  • cv2.cvtColor() for graying images
  • cv2.Canny() for edge detection
  • cv2.findContours() for whole object detection
  • cv2.boxPoints() for creating bounding boxes (CV3)
  • cv2.circle() and cv2.line() for drawing shapes
  • cv2.putText() for writing text over images

As you can see, the world of computer vision is unlimited in scope and power, and OpenCV unlocks the ability to use machine learning to perform tasks that have traditionally been laborious and error-prone for humans to do en masse, like detect solar panel square footage for entire zip codes, or define tin rooftops in African villages. We encourage you to empower yourselves by diving in and experimenting.

Exploring Baseball Data to Predict Run Totals

Table of Contents

  • Overview
  • Support Vector Machines (SVM) The Data
  • Setup and Installation
  • Exploring and Modeling the Data Takeaways


The object of this tutorial is to give you an idea of what a process might look like when a data scientist explores data with the intention of using machine learning to make predictions. Crafting a strategy to classify or regress data involves equal parts art and science, and trial and error and parameter tuning are all part of the game. For this exercise, I’ve chosen a stage ripe with data: baseball. Of course, in order to gain the most value, to be able to integrate this process into whatever projects you’re working on, you should download the data and follow along with the code, and experiment by making changes to anything you can get your hands on.

Support Vector Machines (SVM)

Before we get started, let’s first procrastinate a bit, and digest the basics of the ML algorithm we’re gonna use. Support Vector Machines are supervised machine learning algorithms that have shown success both in classification and regression since the 1960s. They have a clear method of operation that can be applied to complex problems, and for this reason, as well as their accuracy, they remain popular in modern fields, such as bioinformatics, optimization, and algorithmic trading.

The main theme behind the workings of this machine learning algorithm, is the separation of data points in n-dimensional space, by what are called hyperplanes. Each of your features represents one n- dimension. Hyperplanes are basically separators that try to separate your classes of data into different groups, first by similarity, then by distance from one another. The object of the separation is to find the plane that maximizes the distance between classes of your data. The kicker is that during the training of your model, if there is no plane that can effectively separate the classes of data, what is known as a kernel is used to perform quite complex transformations of the data, increasing the dimensions of the data to find better hyperplanes.

SVM kernels can be linear or non-linear, so they can take complex forms that can attempt to morph to your data, often creating interesting planes of demarcation. Because they can take so many forms while finding an optimized hyperplane, tuning model parameters can lead to significantly different results. Later on, we’ll go over some solutions for how to attack this issue.

The Data

A model can’t do much without data, so let’s discuss ours, and then take a look. As it turns out, baseball can be viewed as a simple game that includes a bat, a ball, a hitter, a pitcher, and some other players; that doesn’t sound too complicated. When constructing my model, I realize that it will never replicate reality, instead, we look to simplify reality as much as possible while maximizing the accuracy of predictive outcomes. That being the case, when constructing an accurate model, the mystery is in pinning down those predictive variables which are most responsible for the outcomes. There are many ways to do this, and we will not go over them here, but you should definitely research feature selection if you want to build better models. I’ve done my own feature selection, that mostly concentrates on the pitcher-batter interaction, environment, and how well the ball will fly when hit.

The goal of each pitcher-batter interaction revolves around the fundamental action of hitting, and that involves the flight of the ball before and after a hit. Turns out that the flight of a ball is kinda complicated. After looking through some historical baseball data sources like Baseball Reference and others, and a little feature extraction, I’ve come up with some factors that I think might be important. Some of these features are taken directly from the data sources, and others are derived features, like air_dens. As I found out, the flight of the ball has to do with factors such as altitude and temperature. At higher altitudes and higher temperatures, air molecules are less densely packed, allowing balls to fly farther.

Download the dataset here

Setup and Installation

You should have Scikit Learn, Matplotlib, SciPy, NumPy, Pandas, IPython/Jupyter installed to follow along. It seems like a lot, but it’s not really much, and each of the sources provides excellent documentation   to get up and running on your system. Moreover, it’ll all be up to date. We’re using Python, so surely you’ll need that.

Exploring and Modeling the Data

By using the Pandas command columns, we can view the columns of our dataset. Explore the size and general lay of the land with the DATAFRAME_NAME.describe() command – in this case it would be df_mlb.describe(). Looks like there are almost 600  rows  of  data, much of which is binary (zeros and ones), and  categorical  in  nature. We’ve already made determinations as to whether the park is a pitcher’s park or a hitter’s park, whether the game  was  interleague  or  not, whether the game was in a dome, whether the home team had lost the    last four games in a row (L4r_h), etc. Other data is continuous, like air_dens (air density),  temp (temperature), etc. For this project, what’s important is how we go about predicting the total score of the game using SVM.

Preferably, open up a new Jupyter notebook, or a text editor if you don’t have Jupyter Notebook. Alternatively, Google Colaboratory may make setup a bit easier. In one cell, enter this:

cols_to_transform = ['3ormore', u'avd_sun_sweep',
'bott25rec_h', 'bott25rec_rd','elite_h',
'elite_rd','fatigue_h', 'fatigue_rd',
'hfelite_h', 'hfelite_rd', 'intrlg', 'L4r_h',
'L4r_rd', 'LLby8', 'notenuf', 'notenuf_mild',
df_mlb_dummies = pd.get_dummies(cols_to_transform)
clean_mlb = df_mlb.replace(['yes', 'no'], [1, 0])
clean_mlb = clean_mlb.fillna(0)

What we’ve done here is to create a list, cols_to_transform, that takes all the columns we want to make sure are categorized in a binary fashion. This will ensure that our SVM classifier can view our data as a matrix. We replace any ‘yes’ and ‘no’ categorizations with numerical 1s and 0s respectively. Let’s bring in the heavy lifters by importing the necessary modules. This is typically done at the top of our file:

from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
from from lm = sklearn import svm
sklearn.model_selection LinearRegression() import train_test_split
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.model_selection import GridSearchCV

You should get no error messages on a successful import, but if you do, download any packages that it indicates might be missing. Next, let’s construct our classifier. Before we do so, let’s quickly browse the documentation for a list of parameters for our classifier.

sklearn.svm.SVC(C=1.0, kernel=’rbf’, degree=3,
gamma=’auto_deprecated’, coef0=0.0, shrinking=True,
probability=False, tol=0.001, cache_size=200,
class_weight=None, verbose=False, max_iter=-1,
decision_function_shape=’ovr’, random_state=None)

As you can see, there are quite a few parameters in there, dealing with kernels, gamma, degrees, etc. You could browse the docs to become more informed on those parameters – they are important. But right now, let’s not lose momentum on making baseball total score predictions.

Let’s get our classifier up and running by using a less complicated version so we can see some action:

clf = svm.SVC(gamma=0.0001, C=256,tol=0.001)
from sklearn.preprocessing import MinMaxScaler
scaling = MinMaxScaler(feature_range=(-1,1)).fit(clean_mlb[mlb_features][300:586])
X_train = scaling.transform(clean_mlb[mlb_features][300:586])

I’ve chosen to give a relatively large penalty © for error, and a gamma (kernel coefficient) of 1/1000th. This gamma parameter has to do with the sensitivity to differences in feature vectors. It depends on the degree of dimensions among other things. If gamma is too large, you may overfit your data, meaning that you will be fitting data too closely to your training set instead of fitting the data to structures that will accurately predict real-life, ground truth data. To help with support vector classification, we’ve scaled and transformed the data into an array that can be easily interpreted by the SVM. The transformed data is saved in the variable X_train.

Why did I just save rows 300 to 586? This was my way of splitting the data into training and testing sets. Sometimes, you have lots of data, and other times, with less data available, you have to use part of your data to train (X), and the rest to test dependent variables (y). There are other ways to do this, but this will work for our purposes. Now we’d like to fit the training data to our classifier:


Our dependent/outcome variable that we are attempting to predict, is the total score of the game, t_score_fg . Lastly, in order to see our predictions and the real outcomes of the games side-by-side, we need to construct a dataframe that displays prediction and outcome in the same table:

desc = df_mlb.ix[201:300][['desc','wind','air_dens','dev',
desc = desc.fillna(0)
predict = pd.DataFrame(clf.predict(test_mlb[mlb_features][201:300]))
combined = pd.concat([predict,desc],axis=1)
combined.rename(columns={0: 'predict_svc'}, inplace=True) combined

Now, predict_svc gives the predictions, and t_score_fg gives the actual totals. Okay, but were these predictions any good? That depends on what you consider good. The good news is we don’t have to use our own judgement to determine the error in our model. Sklearn has a quick little method for determining the Mean Absolute Error (MAE), or the average error over the data set:

from sklearn.metrics import mean_absolute_error
y_pred = predict
y_true = test_mlb.t_score5[201:300]
mean_absolute_error(y_true, y_pred)

Using these parameters, it looks like the average total prediction is off by over 4 runs. How could we reduce this? Well, one place to start is with the parameters. While I basically arbitrarily chose the parameters, you could save time finding good parameters by having the computer do the heavy lifting. Sklearn has tools for this:

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import GridSearchCV
X, Y = clean_mlb[mlb_features][300:500],clean_mlb.t_score5[300:500]
# It is usually a good idea to scale the data for SVM trai ning.
# We are cheating a bit in this example in scaling all of the data
scaler = StandardScaler()
X = scaler.fit_transform(X)
# For an initial search, a logarithmic grid with basis
# 10 is often helpful. Using a basis of 2, a finer
# tuning can be achieved but at a much higher cost.
C_range = 2. ** np.arange(1,11)
gamma_range = 10. ** np.arange(-10, -1)
param_grid = dict(gamma=gamma_range, C=C_range)
grid = GridSearchCV(svm.SVC(), param_grid=param_grid, cv=S
grid.fit(X, Y)
print("The best classifier is: ", grid.best_estimator_)

This process uses grid search to find the optimal parameters to use in our SVM model. Now, if you retrain our SVM using the new, suggested clf variable, you’ll get a better model. Here are the newly suggested parameters:

clf = svm.SVC(C=2.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma=1e-10, kernel='poly',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)

You’ll find that it does achieve a lower MAE, but if you run the prediction again you’ll find that the same prediction is made for every game. What that means is that, given our predictive variables, our SVM model finds that you achieve the least model error by predicting a total score of 7 for every game. Well, that’s kind of a bummer, but then again, it means that our model is working pretty darn well. After all, in real baseball, a final score of around 7 or 8 could be considered the average total score. That bodes well for a nice foundation to build upon in the future.


Perhaps you’d like to use a Raspberry Pi or Arduino to predict the amount of water your plants need, or to determine how much food your robot food dispenser needs to give to your dogs, based on season or temperature. You could build a drone that uses a camera and an SVM to classify images into groups. The utility and useability of SVMs is high, and they run fairly fast compared to other similarly performing machine learning algorithms. We encourage you to create and experiment with this data set, and then create your own, or find some interesting publicly-available sets out there to dig into: wearable data, maps data, or whatever you can dream up.