Categories
Career Programming

Programmer interview challenge 2, part 2: Functional FizzBuzz

(In case you missed it, here’s the previous article in this series.)

FizzBuzz became popular in the late 2000 – 2010 decade, which was around the same time that may programmers were beginning to rediscover functional programming. I say rediscover rather than discover because functional programming goes back all the way to Lisp, whose spec was written in 1958, which is the dawn of time as far as modern computing is concerned. As Wikipedia puts it, Lisp is “the second-oldest high-level programming language in widespread use today. Only Fortran is older, by one year.”

(Both Fortran and Lisp are heavily based in mathematics — in fact, Fortran is short for FORmula TRANslation. This is one of the reasons that there’s a strong math bias in programming to this day.)

One senior developer I know tested prospective developers’ functional programming skills by issuing this test to anyone who passed the original FizzBuzz test:

Write FizzBuzz, but this time, instead of FizzBuzzifying the numbers 1 through 100, FizzBuzzify the contents of an array, which can contain any number of integers, in any order.

(The senior developer didn’t use the word “FizzBuzzify,” but I think you get my point.)

The resulting app, if given this array…

[30, 41, 8, 26, 3, 7, 11, 5]

…should output this array:

[‘FizzBuzz’, 41, 8, 26, ‘Fizz’, 7, 11, ‘Buzz’]

Note that the original array contained all integers, while the result array can contain both strings and integers. The senior developer was interviewing programmers who’d be working in Ruby, where you can easily use arrays of mixed types.

You’d get a passing grade if your solution simply adapted the original FizzBuzz to take an array as its input. Here’s a Python implementation of that solution:

def fizzBuzz_list_imperatively(numbers):
  finalResult = []

  for number in numbers:
    currentResult = None
    isMultipleOf3 = (number % 3 == 0)
    isMultipleOf5 = (number % 5 == 0)

    if isMultipleOf3 and isMultipleOf5:
      currentResult = "FizzBuzz"
    elif isMultipleOf3:
      currentResult = "Fizz"
    elif isMultipleOf5:
      currentResult = "Buzz"
    else:
      currentResult = number

    finalResult.append(currentResult)

  return finalResult

However, the developer was looking for a more functional approach. In functional programming, if you’re being asked to perform some kind of calculation based on the contents of a list, you should probably use a map, filter, or reduce operation.

In case you’re not quite familiar with what these are, here’s a simple explanation that uses emojis:

Map, filter, and reduce explained with emoji.
Much nicer than a dry textbook explanation, isn’t it?

The map operation, given a list and a function, applies that function to every item in the given list, which creates a new list. The senior developer granted bonus points to anyone who came up with a map-based solution.

Here’s a Python implementation of what the senior developer was looking for:

def fizzBuzz_list_functionally(number_list):
  return list(map(fizzBuzzify, number_list))

def fizzBuzzify(number):
  isMultipleOf3 = (number % 3 == 0)
  isMultipleOf5 = (number % 5 == 0)

  if isMultipleOf3 and isMultipleOf5:
    return "FizzBuzz"
  elif isMultipleOf3:
    return "Fizz"
  elif isMultipleOf5:
    return "Buzz"
  else:
    return number

This implementation breaks the problem into two functions:

  • A fizzBuzzify() function, which given a number, returns Fizz, Buzz, FizzBuzz, or the original number, depending on its value, and
  • A map() function, which applies fizzBuzzify() across the entire array.

Remember, the senior developer was looking for Ruby developers, and Ruby doesn’t support nested functions. Python does, however, and I like packaging things neatly to prevent errors. I think that if a function a makes exclusive use of another function b, you should nest b inside a.

With that in mind, let’s update fizzBuzz_list_functionally():

def fizzBuzz_list_functionally(number_list):

  def fizzBuzzify(number):
    isMultipleOf3 = (number % 3 == 0)
    isMultipleOf5 = (number % 5 == 0)

    if isMultipleOf3 and isMultipleOf5:
      return "FizzBuzz"
    elif isMultipleOf3:
      return "Fizz"
    elif isMultipleOf5:
      return "Buzz"
    else:
      return number

  return list(map(fizzBuzzify, number_list))

What’s next

In the next installment in this series, we’ll look at FizzBuzz solutions that don’t use the modulo (%) operator.

Previously, in the “Programmer interview challenge” series

Categories
Career Programming

Programmer interview challenge 2: The dreaded FizzBuzz, in Python

The “job interview” scene from Good Will Hunting.

Spotting the programmers who can’t program

Sooner or later, you will encounter — or worse still, end up working with — a programmer who only seems good at programming. This person will have an impressive-looking resume. They’ll know all the proper terminology, be able to speak intelligently about the key concepts in this programming language or that framework or library, and may even have given a pretty good talk at a meetup or conference. But when they’re put to the task of actually writing working software, they just can’t do it.

These aren’t programmers who have difficulty taking on big problems, such as the kind you run into when working on complex problems and writing all-new code from scratch. They’re not even programmers who run into trouble just working on the sort of everyday problems that you encounter maintaining established, working software. These are programmers who can’t solve simple problems, the sort that you should be able to do during a lunch or coffee break. They might be good in other roles in the development process, but not in one where they have to write production code.

As a result, there’s a category of little assignments whose sole purpose isn’t to identify great programmers, or even good ones, but to spot the ones you shouldn’t hire. The best known of these is the dreaded FizzBuzz.

FizzBuzz, explained

Fizzbuzz is an exercise that then-developer Imran Ghory wrote about back in 2007. The programmer is asked to implement a program — often in the language of their choice — that prints the numbers from 1 to 100, but…

  1. If the number is a multiple of 3, print Fizz instead of the number.
  2. If the number is a multiple of 5, print Buzz instead of the number.
  3. If the number is a multiple of both 3 and 5, print FizzBuzz instead of the number.

Simply put, its output should look something like this:

1, 2, Fizz, 4, Buzz, Fizz, 7, 8, Fizz, Buzz, 11, Fizz, 13, 14, FizzBuzz, 16, 17, Fizz, 19, Buzz, Fizz, 22, 23, Fizz, Buzz, 26, Fizz, 28, 29, FizzBuzz, 31, 32, Fizz, 34, Buzz, Fizz, 37, 38, Fizz, Buzz, 41, Fizz, 43, 44, FizzBuzz, 46, 47, Fizz, 49, Buzz, Fizz, 52, 53, Fizz, Buzz, 56, Fizz, 58, 59, FizzBuzz, 61, 62, Fizz, 64, Buzz, Fizz, 67, 68, Fizz, Buzz, 71, Fizz, 73, 74, FizzBuzz, 76, 77, Fizz, 79, Buzz, Fizz, 82, 83, Fizz, Buzz, 86, Fizz, 88, 89, FizzBuzz, 91, 92, Fizz, 94, Buzz, Fizz, 97, 98, Fizz, Buzz.

In his blog entry on FizzBuzz, Imran wrote:

Most good programmers should be able to write out on paper a program which does this in a under a couple of minutes.

Want to know something scary ? – the majority of comp sci graduates can’t. I’ve also seen self-proclaimed senior programmers take more than 10-15 minutes to write a solution.

I’m not saying these people can’t write good code, but to do so they’ll take a lot longer to ship it. And in a business environment that’s exactly what you don’t want.

Let’s look at a couple of Python implementations.

The dumb-as-a-bag-of-hammers solution

If you think your interviewer has a sense of humor, you might try throwing this solution at them. I put this in a file named fizzbuzz.py:

def fizzBuzzDumb():
  return "1, 2, Fizz, 4, Buzz, Fizz, 7, 8, Fizz, Buzz, 11, Fizz, 13, 14, FizzBuzz, 16, 17, Fizz, 19, Buzz, Fizz, 22, 23, Fizz, Buzz, 26, Fizz, 28, 29, FizzBuzz, 31, 32, Fizz, 34, Buzz, Fizz, 37, 38, Fizz, Buzz, 41, Fizz, 43, 44, FizzBuzz, 46, 47, Fizz, 49, Buzz, Fizz, 52, 53, Fizz, Buzz, 56, Fizz, 58, 59, FizzBuzz, 61, 62, Fizz, 64, Buzz, Fizz, 67, 68, Fizz, Buzz, 71, Fizz, 73, 74, FizzBuzz, 76, 77, Fizz, 79, Buzz, Fizz, 82, 83, Fizz, Buzz, 86, Fizz, 88, 89, FizzBuzz, 91, 92, Fizz, 94, Buzz, Fizz, 97, 98, Fizz, Buzz."

Note that I wrote this as a function that returns a string instead of as just a print statement. There’s a reason for this — as a function, it’s testable.

Let’s create a file for FizzBuzz tests. I called mine test_fizzbuzz.py. It’s also pretty dumb — all it does it confirm that fizzBuzzDumb() spits out the right result. It’s pretty much guaranteed to pass, since I copied and pasted the string from fizzBuzzDumb() into the constant that the test uses to confirm that the output is correct:

import pytest
from fizzbuzz import fizzBuzzDumb

fizzBuzz1To100Result = "1, 2, Fizz, 4, Buzz, Fizz, 7, 8, Fizz, Buzz, 11, Fizz, 13, 14, FizzBuzz, 16, 17, Fizz, 19, Buzz, Fizz, 22, 23, Fizz, Buzz, 26, Fizz, 28, 29, FizzBuzz, 31, 32, Fizz, 34, Buzz, Fizz, 37, 38, Fizz, Buzz, 41, Fizz, 43, 44, FizzBuzz, 46, 47, Fizz, 49, Buzz, Fizz, 52, 53, Fizz, Buzz, 56, Fizz, 58, 59, FizzBuzz, 61, 62, Fizz, 64, Buzz, Fizz, 67, 68, Fizz, Buzz, 71, Fizz, 73, 74, FizzBuzz, 76, 77, Fizz, 79, Buzz, Fizz, 82, 83, Fizz, Buzz, 86, Fizz, 88, 89, FizzBuzz, 91, 92, Fizz, 94, Buzz, Fizz, 97, 98, Fizz, Buzz."

def test_dumb_fizzBuzz():
  result = fizzBuzzDumb()
  assert result == fizzBuzz1To100Result, f"The dumb solution returned the wrong result:\nExpected: {fizzBuzz1To100Result}\nActual: {result}."

With fizzbuzz.py and test_fizzbuzz.py in hand, I ran the test, and it unsurprisingly passed:

Working towards a real solution

The multiple problem

The first stumbling block I’ve seen people trying to write FizzBuzz is that they have no idea how to tell if a number x is a multiple of some other number y. This happens particularly often when the programmer doesn’t come from a math background.

(There’s a bit of math snobbery and bias in classical computer science. Some of it is just general academic snobbery, and some of it is from the fact that computer science wasn’t originally its own field of study in universities, but often a branch of the math or engineering department.)

To solve this problem, you want to use the modulo operator — the % symbol. It performs integer division, but instead of giving you the quotient (the result of a division), it gives you the remainder.

For example 3 % 2 gives you a result of 1. That’s because after dividing 3 by 2, you get a remainder of 1. 5 % 2 also gives you a result of 1, because 2 goes into 5 twice, leaving you a remainder of 1. 9 % 5 gives you a result of 4, as 5 goes into 9 once, leaving a remainder of 4.

If a division operation results in no remainder, % returns a result of 0. For instance 2 % 2, 4 % 2, 6 % 2, 8 % 2, and 10 % 2 all return a result of zero, since they’re all even numbers, which are all evenly divisible by 2. Another way of putting it is to say that they’re multiples of 2.

With this in mind, we can easily come up with a couple of statements that test if a number is a multiple of 3 and if a number is a multiple of 5:

isMultipleOf3 = (number % 3 == 0)
isMultipleOf5 = (number % 5 == 0)

Fizz, Buzz, FizzBuzz, or number?

This is the part that really trips up the weak programmers. It doesn’t follow this simple decision structure:

  • If condition A is met:
    • Perform task X.
  • Otherwise, if condition B is met:
    • Perform task Y.
  • Otherwise, if condition C is met:
    • Perform task Z.
  • Otherwise, if none of the above conditions are met:
    • Perform the default task.

FizzBuzz’s decision structure is more like this:

  • If condition A is met:
    • Perform task X.
  • Otherwise, if condition B is met:
    • Perform task Y.
  • But if both condition A and B are met:
    • Don’t perform task X or Y. Perform task Z instead.
  • Otherwise, if none of the above conditions are met:
    • Perform the default task.

This seems like a minor change, but it’s enough to make FizzBuzz a way of filtering out people might have serious trouble writing production software.

I’ve demonstrated live-coding Fizzbuzz in a number of presentations, and the approach I usually take is summarized in this Python code:

if isMultipleOf3 and isMultipleOf5:
  currentResult = "FizzBuzz"
elif isMultipleOf3:
  currentResult = "Fizz"
elif isMultipleOf5:
  currentResult = "Buzz"
else:
  currentResult = str(number)

At the end of this if statement, currentResult contains one of the following: FizzBuzz, Fizz, Buzz, or a number.

When live-coding in front of an audience — which is pretty much what a technical interview is — you want to keep a couple of things in mind:

  • You want to use the code to communicate your intent to the audience as clearly as possible.
  • Complexity is your enemy. You want to make the simplest thing that works.

My approach was to do handle the trickiest case first. My if statement handles the case where the number is both a multiple of 3 and a multiple of 5 first, followed by the individual multiple cases, followed by the default case. It’s simple, it’s easy to follow, and best of all, it works.

Putting it all together

Here’s the fizzBuzz() function that I wrote. I put it in fizzbuzz.py, just after the definition of fizzBuzzDumb():

def fizzBuzz(first = 1, last = 100):
  finalResult = ""

  for number in range(first, last + 1):
    currentResult = ""
    isMultipleOf3 = (number % 3 == 0)
    isMultipleOf5 = (number % 5 == 0)

    if isMultipleOf3 and isMultipleOf5:
      currentResult = "FizzBuzz"
    elif isMultipleOf3:
      currentResult = "Fizz"
    elif isMultipleOf5:
      currentResult = "Buzz"
    else:
      currentResult = str(number)

    finalResult += currentResult

    if number < last:
      finalResult += ", "
    else:
      finalResult += "."

  return finalResult

Here’s the full text of fizzbuzz.py:

def fizzBuzzDumb():
  return "1, 2, Fizz, 4, Buzz, Fizz, 7, 8, Fizz, Buzz, 11, Fizz, 13, 14, FizzBuzz, 16, 17, Fizz, 19, Buzz, Fizz, 22, 23, Fizz, Buzz, 26, Fizz, 28, 29, FizzBuzz, 31, 32, Fizz, 34, Buzz, Fizz, 37, 38, Fizz, Buzz, 41, Fizz, 43, 44, FizzBuzz, 46, 47, Fizz, 49, Buzz, Fizz, 52, 53, Fizz, Buzz, 56, Fizz, 58, 59, FizzBuzz, 61, 62, Fizz, 64, Buzz, Fizz, 67, 68, Fizz, Buzz, 71, Fizz, 73, 74, FizzBuzz, 76, 77, Fizz, 79, Buzz, Fizz, 82, 83, Fizz, Buzz, 86, Fizz, 88, 89, FizzBuzz, 91, 92, Fizz, 94, Buzz, Fizz, 97, 98, Fizz, Buzz."

def fizzBuzz(first = 1, last = 100):
  finalResult = ""

  for number in range(first, last + 1):
    currentResult = ""
    isMultipleOf3 = (number % 3 == 0)
    isMultipleOf5 = (number % 5 == 0)

    if isMultipleOf3 and isMultipleOf5:
      currentResult = "FizzBuzz"
    elif isMultipleOf3:
      currentResult = "Fizz"
    elif isMultipleOf5:
      currentResult = "Buzz"
    else:
      currentResult = str(number)

    finalResult += currentResult

    if number < last:
      finalResult += ", "
    else:
      finalResult += "."

  return finalResult

This new function calls for a new test. Here’s the updated test_fizzbuzz.py:

import pytest
from fizzbuzz import fizzBuzzDumb, fizzBuzz

fizzBuzz1To100Result = "1, 2, Fizz, 4, Buzz, Fizz, 7, 8, Fizz, Buzz, 11, Fizz, 13, 14, FizzBuzz, 16, 17, Fizz, 19, Buzz, Fizz, 22, 23, Fizz, Buzz, 26, Fizz, 28, 29, FizzBuzz, 31, 32, Fizz, 34, Buzz, Fizz, 37, 38, Fizz, Buzz, 41, Fizz, 43, 44, FizzBuzz, 46, 47, Fizz, 49, Buzz, Fizz, 52, 53, Fizz, Buzz, 56, Fizz, 58, 59, FizzBuzz, 61, 62, Fizz, 64, Buzz, Fizz, 67, 68, Fizz, Buzz, 71, Fizz, 73, 74, FizzBuzz, 76, 77, Fizz, 79, Buzz, Fizz, 82, 83, Fizz, Buzz, 86, Fizz, 88, 89, FizzBuzz, 91, 92, Fizz, 94, Buzz, Fizz, 97, 98, Fizz, Buzz."

def test_dumb_fizzBuzz():
  result = fizzBuzzDumb()
  assert result == fizzBuzz1To100Result, f"The dumb solution returned the wrong result:\nExpected: {fizzBuzz1To100Result}\nActual: {result}."

def test_fizzBuzz():
  result = fizzBuzz()
  assert result == fizzBuzz1To100Result, f"The dumb solution returned the wrong result:\nExpected: {fizzBuzz1To100Result}\nActual: {result}."

With fizzbuzz.py and test_fizzbuzz.py revised, I ran the test, and it passed again:

You can download fizzbuzz.py and test_fizzbuzz.py here (2KB, zipped folder with 2 Python files).

What’s next

In the next installment in this series, we’ll look at a different approach to coding FizzBuzz: functional programming!

Recommended reading

Previously, in the “Programmer interview challenge” series

Categories
Career Programming

Programmer interview challenge 1, revisited: Revising “Anagram” in Python and implementing it in JavaScript

In the previous article, I wrote about a challenge that’s often given to programmers at a tech interview: Write a program that determines if two words are anagrams. I posted a solution in Python; check it out here. In this article, we’ll refactor the Python code and write a JavaScript implementation.

Refactoring the Python version

Looking at the code for anagram(), it’s quite clear that it isn’t DRY (Don’t Repeat Yourself), but manifestly the opposite: WET (Write Everything Twice)!

Under the time constraints of a technical interview, you might not always have the time or cognitive bandwidth to keep your code DRY, but if you should try to do so if possible. You may find that it helps convey your algorithmic thinking more effectively to the interviewer, and that’s what you want. After all, your goal throughout the process is to prove that you can actually program.

The repeated code is the part that takes a string, sorts its characters into alphabetical order, and removes the leading space if it exists. Let’s turn that code into its own method:

def sortLetters(word):
  # Returns the given word with its letters sorted
  # into alphabetical order and with any
  # leading space removed.
  word_lowercase = word.lower()
  return ''.join(sorted(word_lowercase)).lstrip()

With this method defined, we can use it in anagram(). In fact, we can nest it within anagram(). Here’s the revision:

def anagram(first_word, second_word):
  # Returns True if the given words are made of the exact same characters,
  # ignoring capitalization and spaces.

  def sortLetters(word):
    # Returns the given word with its letters sorted
    # into alphabetical order and with any
    # leading space removed.
    word_lowercase = word.lower()
    return ''.join(sorted(word_lowercase)).lstrip()

  return sortLetters(first_word) == sortLetters(second_word)

Creating the sortLetters() method doesn’t just DRY up the code, but helps the method better convey what it does. Now, what anagram() does is very clearly conveyed by its return statement: it tells you if the first word with its letters sorted is the same as the second word with its letters sorted.

I confirmed that this refactored code works by running the tests, which show just how useful having tests is.

Implementing anagram() in JavaScript

Here’s anagram() in JavaScript:

function anagram(firstWord, secondWord) {
  
  function sortLetters(word) {
    return word
      .toLowerCase()
      .split('')
      .sort()
      .join('')
      .trim()
  }

  return sortLetters(firstWord) === sortLetters(secondWord)
}

Note that the JavaScript version of sortLetters() is structured slightly differently from the Python version. That’s because JavaScript’s sort() is an array method rather than a general function like Python’s sorted().

In the JavaScript version of sortLetters(), I use method chaining to spell out what happens to the given word step by step:

  1. Convert the word to lower case
  2. Convert that into an array of characters
  3. Sort that array
  4. Convert that array into a string
  5. Remove any trailing or leading whitespace

I could’ve written sortLetters() this way…

function sortLetters(word) {
  return word.toLowerCase().split('').sort().join('').trim()
}

…but I find that “put each method in the chain on its own line” approach more clearly conveys what I’m trying to do:

function sortLetters(word) {
  return word
    .toLowerCase()
    .split('')
    .sort()
    .join('')
    .trim()
}

Next: The Swift version!

Previously, in the “Programmer interview challenge” series

Categories
Career Programming

Programmer interview challenge 1: Anagram

An anagram is a word, phrase, or name that can be formed by rearranging the letters in another word, phrase, or name. For example, iceman is an anagram of cinema, and vice versa. Ignoring spaces and capitalizations, “Florida” is an anagram of “rod fail”.

“Anagram” is a common programming challenge that I’ve seen issued to prospective developers in technical interviews: Write a program or function that can tell if two given words are anagrams of each other. Here’s how you solve it.

The general idea

One solution to the problem is hinted at in the definition of “anagram”. Let’s look at it again:

An anagram is a word, phrase, or name that can be formed by rearranging the letters in another word, phrase, or name.

The word rearranging should be your hint. Somehow, the solution should involve rearranging the letters of both words so that you can compare them. Another word for rearranging is reordering, and when you encounter that word in programming, you should think of sorting. That’s where the solution lies:

If two words are anagrams of each other, sorting each word’s letters into alphabetical order should create two identical words. For example, if you sort the letters in cinema and iceman into alphabetical order, both will be turned into aceimn.

With that in mind, let’s try writing an anagram-detecting function. Given two strings, it should return true if they’re anagrams of each other, and false otherwise.

The Python version

This assumes that you’ve got Python and pytest installed on your system. I recommend installing the Individual Edition of the Anaconda Python distribution, followed by entering pip install pytest at the commend line.

Let’s do this the test-first way. We’ll write the test first, and then write the code. This means that we’ll want to create two files:

  • anagram.py, which will contain the actual code of the solution, and
  • test_anagram.py, which will contain test for that code.

Here’s the code for the test:

import pytest
from anagram import anagram

def test_simple_anagram():
  assert anagram('iceman', 'cinema'), "'cinema' is an anagram of 'iceman'."

…and here’s the code for the initial iteration of the anagram function:

def anagram(first_word, second_word):
  return False

To run the test, enter pytest at the command line. You should see output that looks like this:

Now that we have a failing test, let’s write code to make it pass.

Sorting the letters in a string is Python can be done by using a couple of methods:

  • sorted(), which when given a string, returns an array containing that string’s letters in ascending order. For example, sorted('cinema') returns ['a', 'c', 'e', 'i', 'm', 'n'].
  • join(), which when given a string and an array, returns a string where the elements of the array are joined by the given string. For example, '*'.join(['a', 'b', 'c']) returns 'a*b*c'.

Here’s my code:

def anagram(first_word, second_word):
  first_word_sorted = ''.join(sorted(first_word))
  second_word_sorted = ''.join(sorted(second_word))
  return first_word_sorted == second_word_sorted

Running pytest now gives a passing test:

Let’s now deal with cases with capitalization and spaces. Ideally, the anagram() method should treat “Florida” and “rod fail” as anagrams. We’ll specify this in the test:

import pytest
from anagram import anagram

def test_simple_anagram():
  assert anagram('iceman', 'cinema'), "'cinema' is an anagram of 'iceman'."

def test_complex_anagram():
  assert anagram('Florida', 'rod fail'), "'rod fail', if you ignore spaces and capitalization, is an anagram of 'Florida'."

Running pytest yields these results: 1 failed test and 1 passed test…

We can fix this through the use of another two methods:

  • lower(), which when applied to a string, converts all its letters to lowercase. For example, 'RADAR'.lower() returns 'radar'.
  • lstrip(), which when applied to a string, removes any whitespace characters from the left side. Since the space character has a lower value than any letter in the alphabet, it will always be the leftmost character in a string whose characters have been sorted into ascending order.

Here’s my revised code:

def anagram(first_word, second_word):
  first_word_lowercase = first_word.lower()
  first_word_sorted = ''.join(sorted(first_word_lowercase)).lstrip()
  second_word_lowercase = second_word.lower()
  second_word_sorted = ''.join(sorted(second_word_lowercase)).lstrip()
  return first_word_sorted == second_word_sorted

Running pytest now shows that all the tests pass:

Just to be safe, let’s add a test to make sure than anagram() returns False when given two strings that are not anagrams of each other:

import pytest
from anagram import anagram

def test_simple_anagram():
  assert anagram('iceman', 'cinema'), "'cinema' is an anagram of 'iceman'."

def test_complex_anagram():
  assert anagram('Florida', 'rod fail'), "'rod fail', if you ignore spaces and capitalization, is an anagram of 'Florida'."

def test_non_anagram():
  assert anagram('ABC', 'xyz') == False, "'ABC' and 'xyz' are not anagrams."

All test pass when pytest is run:

And trying all sorts of pairs of strings confirms what the test tells us: anagram() works!

# All of these return True
anagram('Oregon', 'no ogre')
anagram('North Dakota', 'drank a tooth')
anagram('Wisconsin', 'cows in sin')

# All of these return False
anagram('Florida', 'i oil sauna') # Try Louisiana
anagram('New York', 'on my wig') # Try Wyoming
anagram('Georgia', 'navy sin panel') # Try Pennsylvania

…and there you have it!

Next: Implementing anagram() in JavaScript, Swift, and possibly other languages.

Categories
Programming

A couple of handy Python methods that use regular expressions: “Word to initialism” and “Initialism to acronym”

Comic by xkcd. Tap to see the source.

tl;dr: Here’s the code

It’s nothing fancy — a couple of Python one-line methods:

  • word_to_initialism(), which converts a word into an initialism
  • initialism_to_acronym(), which turns an initialism into an acronym
import re

def word_to_initialism(word):
  """Turns every letter in a given word to an uppercase letter followed by a period.
  
  For example, it turns “goat” into “G.O.A.T.”.
  """
  return re.sub('([a-zA-Z])', '\\1.', word).upper()

def initialism_to_acronym(initialism):
  """Removes the period from an initialism, turning it into an acronym.

  For example, it turns “N.A.S.A.” into “NASA”.
  """
  return re.sub('\.', '', initialism)

The project and its dictionary

I’ve been working on a Python project that makes use of a JSON “dictionary” file of words or phrases and their definitions. Here’s a sample of the first few entries in the file, formatted nicely so that they’re a little more readable:

{
   "abandoned industrial site": [
      "Site that cannot be used for any purpose, being contaminated by pollutants."
   ],

   "abandoned vehicle": [
      "A vehicle that has been discarded in the environment, urban or otherwise, often found wrecked, destroyed, damaged or with a major component part stolen or missing."
   ],

   "abiotic factor": [
      "Physical, chemical and other non-living environmental factor."
   ],

   "access road": [
      "Any street or narrow stretch of paved surface that leads to a specific destination, such as a main highway."
   ],

   "access to the sea": [
      "The ability to bring goods to and from a port that is able to harbor sea faring vessels."
   ],

   "accident": [
      "An unexpected, unfortunate mishap, failure or loss with the potential for harming human life, property or the environment.",
      "An event that happens suddenly or by chance without an apparent cause."
   ],

   "accumulator": [
      "A rechargeable device for storing electrical energy in the form of chemical energy, consisting of one or more separate secondary cells.\\n(Source: CED)"
   ],

   "acidification": [
      "Addition of an acid to a solution until the pH falls below 7."
   ],

   "acidity": [
      "The state of being acid that is of being capable of transferring a hydrogen ion in solution."
   ],

   "acidity degree": [
      "The amount of acid present in a solution, often expressed in terms of pH."
   ],

   "acid rain": [
      "Rain having a pH less than 5.6."
   ],

   "acid": [
      "A compound capable of transferring a hydrogen ion in solution.",
      "Being harsh or corrosive in tone.",
      "Having an acid, sharp or tangy taste.",
      "A powerful hallucinogenic drug manufactured from lysergic acid.",
      "Having a pH less than 7, or being sour, or having the strength to neutralize  alkalis, or turning a litmus paper red."
   ],

...

}

The dictionary’s keys are strings that represent the words or phrases, while its values are arrays, where each element in that array is a definition for that word or phrase. To look up the meaning(s) of the word “acid,” you’d use the statement dictionary["acid"].

Dictionary keys are case-sensitive. For most words and phrases in the dictionary, that’s not a problem. Any entry in the dictionary that isn’t for a proper noun (the name of a person, place, organization, or the title of a work) has a completely lowercase key. It’s easy to massage a search term into lowercase with Python’s lower() method for strings.

Any entry in the dictionary that is for a proper noun is titlecased — that is, the first letter in each word is uppercase, and the remaining letters are lowercase. Once again, a search term can be massaged into titlecase in Python; that’s what thetitle()method for strings is for.

When looking up an entry in the dictionary, my application tries a reasonable set of variations on the search term:

  • As entered by the user (stripped of leading and trailing spaces, and sanitized)
  • Converted to lowercase with lower()
  • Converted to titlecase with title()
  • Converted to uppercase with upper()

For example, for the search term “FLorida” (the “FL” capitalization is an intentional typo), the program tries querying the dictionary using dictionary["FLorida"], dictionary["florida"], and dictionary["Florida"].

Looking up words or phrases made out of initials are a little more challenging because people spell them differently:

  • The Latin term for “after noon” — post meridiem — is spelled as pm, p.m., PM, and P.M.
  • Some people write the short form for “United States of America” as USA, while others write it as U.S.A.

To solve this problem, I wrote two short methods:

  • word_to_initialism(), which converts a word into an initialism
  • initialism_to_acronym(), which turns an initialism into an acronym

Here’s the code for both…

import re

def word_to_initialism(word):
  """Turns every letter in a given word to an uppercase letter followed by a period.
  
  For example, it turns “goat” into “G.O.A.T.”.
  """
  return re.sub('([a-zA-Z])', '\\1.', word).upper()

def initialism_to_acronym(initialism):
  """Removes the period from an initialism, turning it into an acronym.

  For example, it turns “N.A.S.A.” into “NASA”.
  """
  return re.sub('\.', '', initialism)

…and here are these methods in action:

# Outputs “G.O.A.T.”
print(f"word_to_initialism(): {word_to_initialism('goat')}")

# Outputs “RADAR”
print(f"initialism_to_acronym(): {initialism_to_acronym('R.A.D.A.R.')}")

Both use regular expressions. Here’s the regular expression statement that drivesword_to_initialism():

re.sub('([a-zA-Z])', '\\1.', word)

re.sub() is Python’s regular expression substitution method, and it takes three arguments:

  • The pattern to look for, which in this case is [a-zA-Z], which means “any alphabetical character in the given string, whether lowercase or uppercase”. Putting this in parentheses puts the pattern in a group.
  • The replacement, which in this case is \\1.. The \\1 specifies that the replacement will start with the contents of the first group, which is the detected alphabetical character. It’s followed by the string literal . (period), which means that a period will be added to the end of every alphabetical character in the given string.
  • The given string, in which the replacement is supposed to take place.

The regular expression behind initialism_to_acronym() is even simpler:

re.sub('\.', '', initialism)

In this method, re.sub() is given these arguments:

  • The pattern to look for, which in this case is \., which means “any period character”.
  • The replacement, which is the empty string.
  • The given string, in which the replacement is supposed to take place.
Categories
Uncategorized

Data science reading list for Monday, October 29, 2018: The worst data science article, 5 basic stats concepts you need to know, Bayes, democratization, and web scraping

A terrible “data skills” article that you should read, but only as a warning

I remember the hype that surrounded the web in the late 1990s. I also remember the copious amount of well-intentioned misinformation that made the rounds as writers attempted to capitalize on that hype. It’s now data science’s turn, if this bit of “advertorial” in Harvard Business Review — Prioritize Which Data Skills Your Company Needs with This 2×2 Matrix — is any indication.

Written by Chris Littlewood, chief innovation and product officer of filtered.com (I’m not going to help them by linking to their site), a company that purports to use AI to “lift productivity by making learning recommendations”, the article clearly highlight’s the author’s ignorance and HBR’s willingness to publish any article that has to do with data or data science. To the credit of the readers, a number of them registered with the site simply to be able to post comments pointing out how nonsensical the article was.

Treat this article as an object lesson in technology hype, as well a sign that data science skills are seen as valuable.

The 5 Basic Statistics Concepts Data Scientists Need to Know

Forget that the article mentioned above said that mathematics and statistics aren’t useful data skills — you can’t do data science without them! You’ll need to understand these 5 concepts (in addition to others):

  1. Statistical features
  2. Probability distributions
  3. Dimensionality reduction
  4. Under- and oversampling
  5. Bayesian statistics

This article in Towards Data Science provides a brief overview.

Data Skeptic: Bayesian Updating

One of the better data science podcasts out there is Kyle Polich’s Data Skeptic, which has been around since 2014 and has over 400 episodes. The podcast features short mini-episodes explaining high level concepts in data science, and longer interview segments with researchers and practitioners.

I’ve just started working my way through this podcast, and have used the example in episode 5, Bayesian Updating, to explain Bayes’ Theorem to people who avoiding studying probability and stats. Give it a listen, then check out the rest of the podcast episodes!

The Democratization of Data Science

Here’s a Harvard Business Review article on data science that’s actually worth reading:

Intelligent people find new uses for data science every day. Still, despite the explosion of interest in the data collected by just about every sector of American business — from financial companies and health care firms to management consultancies and the government — many organizations continue to relegate data-science knowledge to a small number of employees.

That’s a mistake — and in the long run, it’s unsustainable. Think of it this way: Very few companies expect only professional writers to know how to write. So why ask only professional data scientists to understand and analyze data, at least at a basic level?

Data Science Skills: Web scraping using python

Another article from Towards Data Science:

One of the first tasks that I was given in my job as a Data Scientist involved Web Scraping. This was a completely alien concept to me at the time, gathering data from websites using code, but is one of the most logical and easily accessible sources of data. After a few attempts, web scraping has become second nature to me and one of the many skills that I use almost daily.

In this tutorial I will go through a simple example of how to scrape a website to gather data on the top 100 companies in 2018 from Fast Track. Automating this process with a web scraper avoids manual data gathering, saves time and also allows you to have all the data on the companies in one structured file.

Categories
Uncategorized

Maritime DevCon: June 18th in Moncton

martime dev con

If you’re a developer out in the Maritimes, you might want to check out Derek Hatchard’s Maritime Dev Con, which takes place on June 18th in Moncton. It’s a single-afternoon, two-track conference – which means you should be able to take time out to attend it – covering a number of topics including:

  • .NET and ASP.NET
  • Java
  • iPhone development
  • Ruby
  • Python
  • Groovy
  • NoSQL and MongoDB
  • “Rockstar Estimating Skills”

Maritime Dev Con has a registration fee that won’t hurt your wallet – it’s a mere CAD$19!

I’m a big fan of small, regional gatherings like Maritime Dev Con and its western counterpart Prairie DevCon. Each region has its own specializations and needs that a by-locals, for-locals conference can do a better job of serving, and the smaller size of these conferences allows for more back-and-forth between audience and presenter, and between attendees. Support your local conference!

This article also appears in Canadian Developer Connection.