Calculating the Size of Objects in Photos with Computer Vision

Table of Contents

  • Overview
  • Setup
    • Windows
    • Linux
    • OSX
  • OpenCV Basics
  • Getting Started
  • Takeaways


You might have wondered how it is that your favorite social networking application can recognize you and your friends’ faces when you tag them in a photo. Maybe like Harry Potter, a mysterious letter arrived at your doorstep on your birthday; only this letter wasn’t an invitation to the esteemed Hogwarts Academy of Wizardry and Witchcraft, it was a picture of your license plate as you sped through an intersection. A fast- growing segment of artificial intelligence known as computer vision is responsible for both scenarios, as well as a host of other applications you will likely become familiar with in the near future.

The applications of computer vision are endless, both in utility and technical impressiveness, and if you haven’t already, it’s about time you began to witness the power that modern computing technology affords you. The time for painstakingly plodding a path through the dense mathematical forest of how exactly your phone can add funds to your bank account simply by taking a picture of your check has come and gone. Instead, let’s quickly cover only the basic yak-shaving required to spark your interest in how to get from zero to sixty, without the ticket to match.


The tool of choice to foray into how to see the world like a robot is OpenCV. This Python module is a virtual Swiss Army knife that will outfit our computers with bionic abilities. To do so however, we must first overcome setup and installation, which has become much easier than in years past. Depending on your machine and operating system, it should not take an average user with a novice to intermediate level of coding experience any more than 30 minutes, but if there are complications that your computer can’t stomach at first, be patient and in under an hour it will be worth it.


A few prerequisites to installing OpenCV are Matplotlib, SciPy, and NumPy. Downloading and installing the binary distributions of SciPy and NumPy, and Matplotlib from the source are the way to go. The installations of OpenCV change with the regularity you would expect from maintaining a large codebase, so check for the latest download instructions on the OpenCV website. Any other prerequisites that your system needs will be asked for during the setup process.


Most distributions of Linux will have NumPy preinstalled, but for the latest versions of both SciPy and NumPy, using a package manager like apt-get should be the easiest route. As far as OpenCV, the path of least resistance is to consult with the well-maintaind OpenCV Docs. This resource will walk you through the installation, as well as certain caveats and common troubleshooting gripes.


If you have OSX 10.7 and above, NumPy and SciPy should come preinstalled. All of the main sources mentioned above will cover prereqs, and as for OpenCV itself, Homebrew is the most convenient solution. If you don’t have it installed already, head over to the Homebrew ( package manager. In most cases, once brew is installed, the instructions boil down to these basic commands: brew doctor , followed by, brew install opencv , or in error-prone cases, brew install opencv — env=std . In some instances, you may be asked by Homebrew to update a PYTHONPATH, which may involve opening the new (or existing) .bash_profile file in the text editor of your choice, and saving export PYTHONPATH=/usr/local/lib/python2.7/sitepackages:$ PYTHONPATH , or something like that there.

Patiently await the downloads and you should soon have everything installed! Check your installation by launching your Python interpreter with an import cv2 command, and there should be no error messages.

OpenCV Basics

The very basic gist behind OpenCV, along with NumPy, is that they are using multi-dimensional arrays to represent the pixels, the basic building blocks of digital images. If an image resolution is 200 by 300, it would contain 60,000 pixels, each of varying intensities along the RGB scale. You may have seen the RGB triple expressed similar to this (255,0,0) when dealing with a digital color palette from graphic design software or online image editor. Each pixel is assigned like this, and together they form an image, which can be represented as an entire matrix. This matrix of RGB tuples is what OpenCV is good at manipulating.

For this project, we’re going to examine some of the features that, when combined, can lead to some really interesting applications right out of the box. I’d like to see how accurately I can measure the size of some arbitrary objects in a photo.

Getting Started

Since my son left them on the floor, and I stepped on them, I’ve taken a picture of his Hotwheels Tesla car, and a little birdie thing. To make this experiment more straightforward, I’ve added a reference object (a penny), whose size we already know in the image as well. From this reference object and the resulting pixel_to_size ratio, we’ll determine the sizes of other objects in the image.

The basic setup is to be able to run our script from the command line by feeding it the desired image, then it finds the object or objects of interest in the image, bounds them with a rectangle, measures the width and height of the images, along with some visible guides, and displays the output, right to our screen. You may need to pip install imutils or easy_install imutils, which is a package that makes manipulating images with OpenCV and Python even more robust.

Name a file, and input this code:

from scipy.spatial import distance as dist
from imutils import perspective
from imutils import contours
import numpy as np
import argparse
import imutils
import cv2
# construct the argument and parse command line input
aparse = argparse.ArgumentParser()
aparse.add_argument("--image", required=True,help="image p
aparse.add_argument("--width", type=float, required=True,h
elp="width of far left object (inches)")
args = vars(aparse.parse_args())
# load the image, convert it to grayscale, and blur it a b
image = cv2.imread(args["image"])
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (7, 7), 0)

We first create a way to let the script know which image we want to use. This is done using the argparse module. We tell it that we are inputting an image, followed by our reference width. I’ve used a penny for our reference width, and Wikipedia tells me that our 16th president, Abraham Lincoln’s copper bust, measures 0.75 inches across. When we finally run the script on the command line, we’ll use this format: python –image hotwheel.png –width 0.75 . This argument parser is quite reusable, especially for future machine learning projects that you might encounter.

# perform edge detection + dilation + erosion to close gap
s bt edges
edge_detect = cv2.Canny(gray, 15, 100) #play w/min and max
values to finetune edges (2nd n 3rd params)
edge_detect = cv2.dilate(edge_detect, None, iterations=1)
edge_detect = cv2.erode(edge_detect, None, iterations=1)

Edge detection, dilation, and erosion are methods that will no doubt pop up on most image manipulation/computer vision tasks. A great habit to begin mastering and crafting your own projects, is to dive into the more important methods used under the hood by studying the source documentation. Edge detection can be a complex subject if you want it to be. It’s one of the building blocks of computer vision, and should raise your curiosity if you like looking under the hood to find out how things work. The OpenCV docs, while definitely having an old-school vibe, are actually pretty detailed and informative. What we’ve done with the gray variable was to turn our image grayish, helping define the edges and contours. Now, the Canny method, named after its founder John F. Canny, uses a combination of noise reduction and something called intensity gradients to determine what are continuous edges of an object, and what is probably not. If you want to see what our poor man’s Terminator sees at this point, you could just display edge_detect by adding cv2.imshow(‘Edges’,edge_detect) . It would look something like this:

If you use your imagination a bit, you can start to see how Cyberdyne Systems was able to have the T1000 identify motorcycles, leather jackets, and shotguns in the future.

# find contours in the edge map
cntours = cv2.findContours(edge_detect.copy(), cv2.RETR_EX
cntours = imutils.grab_contours(cntours)
# sort contours left-to-right
(cntours, _) = contours.sort_contours(cntours)
pixel_to_size = None
# function for finding the midpoint
def mdpt(A, B):
return ((A[0] + B[0]) * 0.5, (A[1] + B[1]) * 0.5)

The findCountours method further identifies what we would consider contours of various whole objects in our image. We sort them left-toright, starting with our reference penny. Knowing that the penny goes first, we can use our pixel_to_size ratio to find out the sizes of the other objects. We’ve just initialized the penny ratio here, and we’ll use it later. Lastly, we create a function to find the middle of the object lines that we’ll draw later, so keep that in mind.

# loop over the contours individually
for c in cntours:
if cv2.contourArea(c) < 100: #ignore/fly through cont
ours that are not big enough
# compute the rotated bounding box of the contour; sho
uld handle cv2 or cv3..
orig = image.copy()
bbox = cv2.minAreaRect(c)
bbox = if imutils.is_cv2() else
bbox = np.array(bbox, dtype="int")
# order the contours and draw bounding box
bbox = perspective.order_points(bbox)
cv2.drawContours(orig, [bbox.astype("int")], -1, (0, 2
55, 0), 2)

Everything else in this script runs under this for loop. Our contours now define what we think to be the isolated objects within the image. Now that that’s complete, we make sure that only contours/objects that have an area larger than 100px will stay to be measured. We define bounding boxes as rectangles that will fit over  the  objects,  and  turn them into Numpy arrays. In the last step  we  draw  a  green  bounding box. Note that OpenCV reverses the order of Red, Green, and Blue, so   Blue is the first number in the tuple, followed by Green, and Red.

Basically, all that’s left is to draw our lines and bounding points, add midpoints, and measure lengths.

# loop over the original points in bbox and draw them; 5px
red dots
for (x, y) in bbox:, (int(x), int(y)), 5, (0, 0, 255),
# unpack the ordered bounding bbox; find midpoints
(tl, tr, br, bl) = bbox
(tltrX, tltrY) = mdpt(tl, tr)
(blbrX, blbrY) = mdpt(bl, br)
(tlblX, tlblY) = mdpt(tl, bl)
(trbrX, trbrY) = mdpt(tr, br)

Here’s where we use our midpoint function, mdpt . From our four bounding box points that enclose our object, we’re looking for half-way between each line. You see how easy it is to draw circles for our bounding box points, by using the command. Without cheating, can you tell what color I’ve made them? If you guessed Blue… you’re wrong! Yep, Red – There’s that order reversal that OpenCV likes to use. Red dots, 5px big. When you run the code yourself, change some of these parameters to see how it alters what we’re drawing or how the bounding boxes might get thrown off by poor countours, etc.

# draw the mdpts on the image (blue);lines between the mdp
ts (yellow), (int(tltrX), int(tltrY)), 5, (255, 0,
0), -1), (int(blbrX), int(blbrY)), 5, (255, 0,
0), -1), (int(tlblX), int(tlblY)), 5, (255, 0,
0), -1), (int(trbrX), int(trbrY)), 5, (255, 0,
0), -1)
cv2.line(orig, (int(tltrX), int(tltrY)), (int(blbrX),
int(blbrY)),(0, 255, 255), 2)
cv2.line(orig, (int(tlblX), int(tlblY)), (int(trbrX),
int(trbrY)),(0, 255, 255), 2)
# compute the Euclidean distances between the mdpts
dA = dist.euclidean((tltrX, tltrY), (blbrX, blbrY))
dB = dist.euclidean((tlblX, tlblY), (trbrX, trbrY))

Not much going on here except drawing blue midpoints of lines, 5px big. dA and dB are a bit more interesting, because we are computing the distance between bounding box points. We did this with the euclidean() method of the dist object that we imported from the SciPy library at the start of our script.

On to the finale:

# use pixel_to_size ratio to compute object size
if pixel_to_size is None:
pixel_to_size = dB / args["width"]
distA = dA / pixel_to_size
distB = dB / pixel_to_size
# draw the object sizes on the image
cv2.putText(orig, "{:.1f}in".format(distA),
(int(tltrX - 10), int(tltrY - 10)), cv2.FONT_HERSH
EY_DUPLEX,0.55, (255, 255, 255), 2)
cv2.putText(orig, "{:.1f}in".format(distB),
(int(trbrX + 10), int(trbrY)), cv2.FONT_HERSHEY_DU
PLEX,0.55, (255, 255, 255), 2)
# show the output image
cv2.imshow("Image", orig)

Here’s where the magic happens. We can now employ our penny ratio to find the size of the other objects. All we need is to use one line divided      by our ratio, and we know how long and  wide  our  object  is.  It’s  like using a map scale to convert an inch into a mile. Now we superimpose the distance text over our original image (which is actually a copy of our original image). I’ve rounded this number to one decimal place, so that would explain why our result shows our penny as having a height and width of 0.8 inches. Rest assured skeptics, it has rounded up  from  a perfect 0.75 inches; of course, you should change the accuracy to two decimal places yourself, just to make sure. Our last two lines are commands to display our image and rotate through the drawn bounding boxes on any key press.


I told you we would dive right in. You may want to try snapping a similar photo of your own and tinkering with many of these reusable code snippets. Of particular interest are these methods, that will popup again and again in your future computer vision projects:

  • cv2.cvtColor() for graying images
  • cv2.Canny() for edge detection
  • cv2.findContours() for whole object detection
  • cv2.boxPoints() for creating bounding boxes (CV3)
  • and cv2.line() for drawing shapes
  • cv2.putText() for writing text over images

As you can see, the world of computer vision is unlimited in scope and power, and OpenCV unlocks the ability to use machine learning to perform tasks that have traditionally been laborious and error-prone for humans to do en masse, like detect solar panel square footage for entire zip codes, or define tin rooftops in African villages. We encourage you to empower yourselves by diving in and experimenting.

Exploring Baseball Data to Predict Run Totals

Table of Contents

  • Overview
  • Support Vector Machines (SVM) The Data
  • Setup and Installation
  • Exploring and Modeling the Data Takeaways


The object of this tutorial is to give you an idea of what a process might look like when a data scientist explores data with the intention of using machine learning to make predictions. Crafting a strategy to classify or regress data involves equal parts art and science, and trial and error and parameter tuning are all part of the game. For this exercise, I’ve chosen a stage ripe with data: baseball. Of course, in order to gain the most value, to be able to integrate this process into whatever projects you’re working on, you should download the data and follow along with the code, and experiment by making changes to anything you can get your hands on.

Support Vector Machines (SVM)

Before we get started, let’s first procrastinate a bit, and digest the basics of the ML algorithm we’re gonna use. Support Vector Machines are supervised machine learning algorithms that have shown success both in classification and regression since the 1960s. They have a clear method of operation that can be applied to complex problems, and for this reason, as well as their accuracy, they remain popular in modern fields, such as bioinformatics, optimization, and algorithmic trading.

The main theme behind the workings of this machine learning algorithm, is the separation of data points in n-dimensional space, by what are called hyperplanes. Each of your features represents one n- dimension. Hyperplanes are basically separators that try to separate your classes of data into different groups, first by similarity, then by distance from one another. The object of the separation is to find the plane that maximizes the distance between classes of your data. The kicker is that during the training of your model, if there is no plane that can effectively separate the classes of data, what is known as a kernel is used to perform quite complex transformations of the data, increasing the dimensions of the data to find better hyperplanes.

SVM kernels can be linear or non-linear, so they can take complex forms that can attempt to morph to your data, often creating interesting planes of demarcation. Because they can take so many forms while finding an optimized hyperplane, tuning model parameters can lead to significantly different results. Later on, we’ll go over some solutions for how to attack this issue.

The Data

A model can’t do much without data, so let’s discuss ours, and then take a look. As it turns out, baseball can be viewed as a simple game that includes a bat, a ball, a hitter, a pitcher, and some other players; that doesn’t sound too complicated. When constructing my model, I realize that it will never replicate reality, instead, we look to simplify reality as much as possible while maximizing the accuracy of predictive outcomes. That being the case, when constructing an accurate model, the mystery is in pinning down those predictive variables which are most responsible for the outcomes. There are many ways to do this, and we will not go over them here, but you should definitely research feature selection if you want to build better models. I’ve done my own feature selection, that mostly concentrates on the pitcher-batter interaction, environment, and how well the ball will fly when hit.

The goal of each pitcher-batter interaction revolves around the fundamental action of hitting, and that involves the flight of the ball before and after a hit. Turns out that the flight of a ball is kinda complicated. After looking through some historical baseball data sources like Baseball Reference and others, and a little feature extraction, I’ve come up with some factors that I think might be important. Some of these features are taken directly from the data sources, and others are derived features, like air_dens. As I found out, the flight of the ball has to do with factors such as altitude and temperature. At higher altitudes and higher temperatures, air molecules are less densely packed, allowing balls to fly farther.

Download the dataset here

Setup and Installation

You should have Scikit Learn, Matplotlib, SciPy, NumPy, Pandas, IPython/Jupyter installed to follow along. It seems like a lot, but it’s not really much, and each of the sources provides excellent documentation   to get up and running on your system. Moreover, it’ll all be up to date. We’re using Python, so surely you’ll need that.

Exploring and Modeling the Data

By using the Pandas command columns, we can view the columns of our dataset. Explore the size and general lay of the land with the DATAFRAME_NAME.describe() command – in this case it would be df_mlb.describe(). Looks like there are almost 600  rows  of  data, much of which is binary (zeros and ones), and  categorical  in  nature. We’ve already made determinations as to whether the park is a pitcher’s park or a hitter’s park, whether the game  was  interleague  or  not, whether the game was in a dome, whether the home team had lost the    last four games in a row (L4r_h), etc. Other data is continuous, like air_dens (air density),  temp (temperature), etc. For this project, what’s important is how we go about predicting the total score of the game using SVM.

Preferably, open up a new Jupyter notebook, or a text editor if you don’t have Jupyter Notebook. Alternatively, Google Colaboratory may make setup a bit easier. In one cell, enter this:

cols_to_transform = ['3ormore', u'avd_sun_sweep',
'bott25rec_h', 'bott25rec_rd','elite_h',
'elite_rd','fatigue_h', 'fatigue_rd',
'hfelite_h', 'hfelite_rd', 'intrlg', 'L4r_h',
'L4r_rd', 'LLby8', 'notenuf', 'notenuf_mild',
df_mlb_dummies = pd.get_dummies(cols_to_transform)
clean_mlb = df_mlb.replace(['yes', 'no'], [1, 0])
clean_mlb = clean_mlb.fillna(0)

What we’ve done here is to create a list, cols_to_transform, that takes all the columns we want to make sure are categorized in a binary fashion. This will ensure that our SVM classifier can view our data as a matrix. We replace any ‘yes’ and ‘no’ categorizations with numerical 1s and 0s respectively. Let’s bring in the heavy lifters by importing the necessary modules. This is typically done at the top of our file:

from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
from from lm = sklearn import svm
sklearn.model_selection LinearRegression() import train_test_split
from sklearn.model_selection import StratifiedShuffleSplit
from sklearn.model_selection import GridSearchCV

You should get no error messages on a successful import, but if you do, download any packages that it indicates might be missing. Next, let’s construct our classifier. Before we do so, let’s quickly browse the documentation for a list of parameters for our classifier.

sklearn.svm.SVC(C=1.0, kernel=’rbf’, degree=3,
gamma=’auto_deprecated’, coef0=0.0, shrinking=True,
probability=False, tol=0.001, cache_size=200,
class_weight=None, verbose=False, max_iter=-1,
decision_function_shape=’ovr’, random_state=None)

As you can see, there are quite a few parameters in there, dealing with kernels, gamma, degrees, etc. You could browse the docs to become more informed on those parameters – they are important. But right now, let’s not lose momentum on making baseball total score predictions.

Let’s get our classifier up and running by using a less complicated version so we can see some action:

clf = svm.SVC(gamma=0.0001, C=256,tol=0.001)
from sklearn.preprocessing import MinMaxScaler
scaling = MinMaxScaler(feature_range=(-1,1)).fit(clean_mlb[mlb_features][300:586])
X_train = scaling.transform(clean_mlb[mlb_features][300:586])

I’ve chosen to give a relatively large penalty © for error, and a gamma (kernel coefficient) of 1/1000th. This gamma parameter has to do with the sensitivity to differences in feature vectors. It depends on the degree of dimensions among other things. If gamma is too large, you may overfit your data, meaning that you will be fitting data too closely to your training set instead of fitting the data to structures that will accurately predict real-life, ground truth data. To help with support vector classification, we’ve scaled and transformed the data into an array that can be easily interpreted by the SVM. The transformed data is saved in the variable X_train.

Why did I just save rows 300 to 586? This was my way of splitting the data into training and testing sets. Sometimes, you have lots of data, and other times, with less data available, you have to use part of your data to train (X), and the rest to test dependent variables (y). There are other ways to do this, but this will work for our purposes. Now we’d like to fit the training data to our classifier:[mlb_features][300:586],clean_mlb.t_score_fg[300:586])

Our dependent/outcome variable that we are attempting to predict, is the total score of the game, t_score_fg . Lastly, in order to see our predictions and the real outcomes of the games side-by-side, we need to construct a dataframe that displays prediction and outcome in the same table:

desc = df_mlb.ix[201:300][['desc','wind','air_dens','dev',
desc = desc.fillna(0)
predict = pd.DataFrame(clf.predict(test_mlb[mlb_features][201:300]))
combined = pd.concat([predict,desc],axis=1)
combined.rename(columns={0: 'predict_svc'}, inplace=True) combined

Now, predict_svc gives the predictions, and t_score_fg gives the actual totals. Okay, but were these predictions any good? That depends on what you consider good. The good news is we don’t have to use our own judgement to determine the error in our model. Sklearn has a quick little method for determining the Mean Absolute Error (MAE), or the average error over the data set:

from sklearn.metrics import mean_absolute_error
y_pred = predict
y_true = test_mlb.t_score5[201:300]
mean_absolute_error(y_true, y_pred)

Using these parameters, it looks like the average total prediction is off by over 4 runs. How could we reduce this? Well, one place to start is with the parameters. While I basically arbitrarily chose the parameters, you could save time finding good parameters by having the computer do the heavy lifting. Sklearn has tools for this:

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import StratifiedKFold
from sklearn.model_selection import GridSearchCV
X, Y = clean_mlb[mlb_features][300:500],clean_mlb.t_score5[300:500]
# It is usually a good idea to scale the data for SVM trai ning.
# We are cheating a bit in this example in scaling all of the data
scaler = StandardScaler()
X = scaler.fit_transform(X)
# For an initial search, a logarithmic grid with basis
# 10 is often helpful. Using a basis of 2, a finer
# tuning can be achieved but at a much higher cost.
C_range = 2. ** np.arange(1,11)
gamma_range = 10. ** np.arange(-10, -1)
param_grid = dict(gamma=gamma_range, C=C_range)
grid = GridSearchCV(svm.SVC(), param_grid=param_grid, cv=S
tratifiedKFold()), Y)
print("The best classifier is: ", grid.best_estimator_)

This process uses grid search to find the optimal parameters to use in our SVM model. Now, if you retrain our SVM using the new, suggested clf variable, you’ll get a better model. Here are the newly suggested parameters:

clf = svm.SVC(C=2.0, cache_size=200, class_weight=None, coef0=0.0,
decision_function_shape='ovr', degree=3, gamma=1e-10, kernel='poly',
max_iter=-1, probability=False, random_state=None, shrinking=True,
tol=0.001, verbose=False)

You’ll find that it does achieve a lower MAE, but if you run the prediction again you’ll find that the same prediction is made for every game. What that means is that, given our predictive variables, our SVM model finds that you achieve the least model error by predicting a total score of 7 for every game. Well, that’s kind of a bummer, but then again, it means that our model is working pretty darn well. After all, in real baseball, a final score of around 7 or 8 could be considered the average total score. That bodes well for a nice foundation to build upon in the future.


Perhaps you’d like to use a Raspberry Pi or Arduino to predict the amount of water your plants need, or to determine how much food your robot food dispenser needs to give to your dogs, based on season or temperature. You could build a drone that uses a camera and an SVM to classify images into groups. The utility and useability of SVMs is high, and they run fairly fast compared to other similarly performing machine learning algorithms. We encourage you to create and experiment with this data set, and then create your own, or find some interesting publicly-available sets out there to dig into: wearable data, maps data, or whatever you can dream up.