Click the image to see it at full size.
(They also should’ve done a better job vetting the one IT guy they hired.)
Every week, I compile a list of events for developers, technologists, tech entrepreneurs, and nerds in and around the Tampa Bay area. We’ve got a lot of events going on this week, and here they are!
Monday, November 5
- Cool ‘n Confident Toastmasters @ SPC – St. Petersburg/Gibbs Campus, 6:30 PM to 8:00 PM
- Largo Board Games Meetup — Xia: Embers of a Forsaken Star @ 7:00 PM to 9:00 PM
- Tampa Bay Bitcoin — Mining Mondays @ Tampa Bay Wave, 7:00 PM to 9:00 PM
- South Tampa Toastmasters @ Unity of Tampa, 7:00 PM to 8:15 PM
- Geekocracy! — Celebrate National Doughnut Day at ‘Datz & Dough’! @ Datz & Dough, 7:30 PM to 9:30 PM
Tuesday, November 6
I’m not eligible to vote in the U.S. (I’m a Canadian citizen here on a green card), but if you are, go vote before you do anything extracurricular today!
- Dreamit X BISNOW Innovation Summit 2018 @ Tampa Marriott Waterside Hotel & Marina, Tuesday 8:00 AM to Wednesday 5:00 PM
- Entrepreneurs & Startups – Bradenton Networking & Education — Network & Learn – Stone Soup – Finding the Power of Community @ Station 2 Innovation Center, 11:30 AM to 1:00 PM
- Westshore Toastmasters @ FIVE Labs, 12:00 PM to 1:00 PM
- Brandon Boardgamers — Tuesday Night Gaming @ Cool Stuff Games, 5:00 PM to 8:00 PM
- Learn Cybersecurity Tampa — War Games: Intro to Strategy & GRC @ SecureSet, 5:30 PM to 9:00 PM
- Tampa Bay Agile — Living Off an Agile Landscape: Agile Farming vs Gardening (Quincy Jordan) @ KForce Tampa, 6:00 PM to 8:00 PM
- Tampa Hackerspace member meeting @ Tampa Hackerspace, 6:00 PM to 7:00 PM
- Sarasota Entrepreneurs Meetup — How Can I Create a Business From Idea to Making Revenue? Learn in 6 Month Class @ John Greer Auto Sales, 6:00 PM to 8:00 PM
- Game Club Tampa Meetup — Tuesday Nite Roleplayers (RPGs) @ Grand Arena of Mind Expansion, 6:30 PM to 9:30 PM
- CLEARWATER eMarketing Groups — Internet Marketing for Business Owners @ ihop (30200 US Hwy 19, South of Curlew Rd, Clearwater), 7:00 PM to 10:00 PM
- St. Pete Beers ‘n Board Games Meetup for Young Adults @ Flying Boat Brewing Company, 7:00 PM to 10:00 PM
Wednesday, November 7
- Open/FREE Coworking for Latino Tech Entrepreneurs @ FirstWaVE Venture Center, 8:00 AM to 11:00 AM
- 1 Million Cups St. Pete — Overcome the Barrier, Inc / Showered and Empowered, Inc. @ 9:00 AM
- 1 Million Cups Tampa — American Freedom Distillery / Raise the Bar Design @ 9:00 AM
- Suncoast Developers Guild Open House @ Suncoast Developers Guild, 12:00 PM to 2:00 PM
- TampaBay Cryptocurrency and Blockchain Technology Meetup — SOFWERX Security Summit @ SOFWERX Underground, 5:00 PM to 8:00 PM
- Code for Tampa Bay Brigade — Open Hack Night @ Entrepreneur Collaborative Center (ECC), 5:30 PM to 7:30 PM
- Tampa Bay Business Intelligence and Data Analytics — Monthly Meeting @ AgileThought, Inc, 6:00 PM to 8:00 PM
- Learn Cybersecurity Tampa — SecureSet Info Night Tampa @ SecureSet Tampa Campus, 6:00 PM to 7:00 PM
- Tampa Bay Agile — Tampa Bay Scrum Masters Guild @ KForce Tampa, 6:00 PM to 8:00 PM
- Tampa Artificial Intelligence Meetup Monthly Meeting @ Entrepreneur Collaborative Center, 6:30 PM to 8:00 PM
- Learn to Code | Thinkful Tampa — Web Development vs Data Science @ Secureset Academy, 6:30 PM to 8:00 PM
- Design St. Pete — How to give and receive design feedback @ Rising Tide Innovation Center, 6:45 PM to 8:45 PM
- Laser Cutter Orientation @ Tampa Hackerspace, 7:00 PM to 9:00 PM
- St Pete .NET Meetup — .NET Core and Containers with Shayne Boyer @ Bank of the Ozark’s Innovation Lab, 7:00 PM to 9:00 PM
- Nerdbrew Events — Games & Grog @ Peabody’s!, 7:00 PM to 11:00 PM
- Women In Linux — Understanding Linux @ 7:00 PM to 9:00 PM
Thursday, November 8
- The Green Asterisk Coworking @ The Pearl, 11:00 AM to 4:00 PM
- Coffee & Code @ Pour House at Grand Central, 11:15 AM to 1:00 PM
- Tampa Python Meetup — Student Mentoring @ SecureSet in Tampa @ 11:30 AM to 12:30 PM
- Learn Cybersecurity Tampa — War Games: Applied Cryptography 1 @ SecureSet, 5:30 PM to 9:00 PM
- Tampa Software QA and Testing Meetup — Artificial intelligence and testing in the digital age @ Sogeti Tampa, 5:30 PM to 7:30 PM
- Lean Beer for All Things Agile (St Petersburg) @ Pour Tap Room, 6:00 PM to 7:30 PM
- Joomla! User Group Tampa — Making your Joomla site fly – Optimizing your website for Google page speed. @ SiteWit Corporation, 6:00 PM to 7:30 PM
- Tampa Bay Azure User Group — IoT – Concepts, Practical Use Cases and Azure @ Microsoft Corporation, 6:00 PM to 8:00 PM
- Lean Beer for All Things Agile (Lakeland) @ Swan Brewing, 6:00 PM to 8:30 PM
- Tampa Bay AWS User Group @ Happy Hour with AWS Experts, 6:00 PM to 8:00 PM
- Seffner D&D Meetup — 1st ed AD&D Campaign. Open to new players. @ 6:00 PM to 10:00 PM
- Tampa Bay UX Group — World Usability Day Tampa Bay 2018 Workshop: Avoiding ‘Accidentally Evil’ Design @ Keiser University Tampa, 6:30 PM to 8:30 PM
- Front-End Design Meetup — Accessible By Design with Tim Knight @ Suncoast Developers Guild, 6:30 PM to 8:00 PM
- Front-End Design Meetup at SDG: Accessible By Design with Tim Knight @ Suncoast Developers Guild, 6:30 PM to 8:30 PM
- Geekocracy! — Feast at Mr. Dunderbak’s Biergarten! @ Mr. Dunderbak’s, 7:00 PM to 10:00 PM
- IPA’s & API’s — Version 2.0 @ Arkane Aleworks, 7:00 PM to 9:00 PM
Friday, November 9
- Lean Coffee for All Things Agile (Waters Location) @ Panera Bread (6001 W Waters Ave, Tampa), 7:30 AM to 8:30 AM
- Lean Coffee for All Things Agile (St Petersburg / Tyrone) @ Panera (2420 66th St North, St Petersburg), 7:30 AM to 8:30 AM
- Florida Funders Institute Lunch & Learn @ USF Health, CAMLS, 11:30 AM to 1:00 PM
- Game On – Gaming — Storm Kings Thunder (Subject to interest) @ Game On Movies & Games, 6:00 PM to 9:00 PM
- Game Club Tampa Meetup — New 5e Campaign – Heroes of Faerun (LFP) @ Grand Arena of Mind Expansion, 6:30 PM to 11:00 PM
- Tampa Monopoly Meetup @ Panera Bread (112 S Westshore, Tampa)
- Geekocracy! — National Louisiana Day! @ Tibby’s New Orleans Kitchen, 7:30 PM to 10:30 PM
- Geekocracy! — Tampa Theater: Bill and Ted’s Excellent Adventure! @ The Tampa Theater, 10:30 PM to 12:30 AM
Saturday, November 10
- BarCamp Tampa Bay @ University Mall – West Wing (near the old J.C. Penney store), 8:00 AM to 5:00 PM
- Tampa Drones Meetup — Racing Drone Free Fly @ Highlander Park, 10:00 AM to 1:00 PM
- Tampa Drones Meetup — FPV Experience – MOSI Volunteer Opportunity @ MOSI (Museum of Science and Industry), 10:00 AM to 2:30 PM
- Toastmasters District 48 — The Order of Smedley .. Advanced Club @ 10:30 AM to 12:30 PM
- Game Club Tampa Meetup — G.A.M.E RPG Convention @ Grand Arena of Mind Expansion, 12:00 PM to 10:00 PM
- Tampa School of AI — Catch up on projects and ideas @ Starbucks (3619 W Gandy Blvd, Tampa), 12:00 PM to 2:00 PM
- Nerd Night Out — NNO Book Club: The Fifth Season @ Origami Sushi, 1:00 PM to 3:00 PM
- Holiday Robotics Meetup | For Adults | Novices Welcome — First Meetup! @ Glory Days Grill, 1:00 PM to 3:00 PM
- Critical Hit Games — Keyforge Prelaunch Event #1 @ Critical Hit Games, 3:00 PM to 6:00 PM
- Board Games and Card Games in Sarasota & Bradenton — Games at Kelly & Scott’s House @ 6:00 PM to 9:00 PM
- St. Pete Makers — Open Make Night / Open House @ St. Pete Makers, 6:00 PM to 8:00 PM
- Critical Hit Games — Keyforge Prelaunch Event #2 @ Critical Hit Games, 8:00 PM to 11:00 PM
Sunday, November 11
With Student Interest Soaring, Berkeley Creates New Data-Sciences Division
From Chronicle of Higher Education:
Berkeley’s move follows MIT’s announcement last month that it was investing $1 billion in a new college of artificial intelligence. But leaders at Berkeley say their disclosure of the division today was driven by an imminent international search for a director, who will hold the title of associate provost, putting the program on an institutional par with Berkeley’s colleges and schools. They explain that in creating a division rather than a new college, they are reflecting the way data science has become woven into every discipline.
Berkeley has been planning the division for four years, said David Culler, interim dean for data sciences, and has been rolling it out incrementally through a new data-sciences major approved last year, and corresponding growth in data-science courses. Enrollment in “Foundations of Data Science” has soared from 100 in 2015 to 1,300 in 2018. Enrollment in the upper-level “Principles and Techniques of Data Science” has grown from 100 in 2016 to 800 students. The emerging program has served as a “pilot” for the division, which is now set to evolve under a new director.
…
The core of the data-science curriculum, said Culler, is computer science and statistics, with additional depth courses in optimization and visualization. But students will also be required to have a “domain emphasis” that would most likely synthesize material from various other departments. For instance, a data-science student’s exploration of social inequality might include courses in sociology, ethnic studies, economics, and philosophy.
‘With a basic degree, you can learn data science on the job’
Next week at the National Analytics Conference, [Jennifer Cruise from the Aon Centre for Innovation and Analytics] will be on a panel where she expects to discuss several aspects and challenges that businesses face relating to data, including how to deal with the abundance of information that is now available and, of course, the key issues of skills and resources.
“You can only truly exploit the data if you get the right people in that space, and there’s a double whammy,” she said. “On the one hand, you have a lack of hands-on resources. Skilled data scientists are hard to come by and things are changing quickly, so people who are qualified need to stay on top of things. Then, you also have a gap in the leadership space – the people who can advise you how to turn [data] into revenue for your company, or how to use your data to become more operationally efficient.”
8 common questions from aspiring data scientists, answered
So, you want to be become a data scientist? Great. But you have zero experience and have no clue how to get started in this field. I get it. I’ve been there and I definitely feel you. This is why this post is for you.
All the questions below came from the community through my LinkedIn post, email, and other channels. I hope that by sharing my experience, you will be enlightened on how to pursue a data science career and make your learning journey fun.
OPINION: How to craft effective data science job descriptions
In today’s data science job market, demand far outstrips supply, said Chris Nicholson, co-founder and CEO of artificial intelligence and deep learning company Skymind, and co-creator of the open source framework Deeplearning4j. That means organizations must resist the temptation to seek candidates with every last required data science skill in favor of hiring for potential and then training on the job, he said.
“A lot of data science has to do with statistics, math and experimentation—so you’re not necessarily looking for someone with a computer science or software engineering background, though they should have some programming experience,” Nicholson said. “You want folks from physical science, math, physics, natural sciences backgrounds; people who are trained to think about statistical ideas and use computational tools. They need to have the ability to look at data and use tools to manipulate it, explore correlations and produce data models that make predictions.”
Because a data scientist’s job isn’t to engineer entire systems, minimal programming experience is fine, Nicholson said. After all, most organizations can rely on software engineering, DevOps, or IT teams to build, manage and maintain infrastructure in support of data science efforts. Instead, strong data science candidates often have a background in science and should be proficient with data science tools in one or more different stacks.
If you want to get into data science with a limited budget, this reading list is for you — it’s all about data science and related books that you can get for free!
Allen B. Downey’s free Python and math books
Allen B. Downey is a believer in free books, and has a whole article explaining why. Here are its concluding paragraphs:
A free book is the root of a tree of potential adaptations, translations, and entirely new books that branch out from the original. Free books transform readers into proof-readers, editors, anthologists, correspondents, contributors, collaborators, writers and authors.
If you are thinking about writing a book, start soon, release early and often, give up control but do a little policing, keep a contributor list, and make it free.
He’s written a number of free books, and the ones most applicable to data science are:
- Think Python, 2nd edition. A great introduction to programming and Python 3 that’s suitable for beginners, but also a great reference for people like me who have to bounce between languages. You can read it online or download it in PDF form, and code examples and solutions are available in this Github repo.
- Think Stats, 2nd edition. If you know Python but don’t know probability and statistics, this is your book! It encourages you to work with real datasets, and presents you with a case study using real data from the U.S. National Institutes of Health. You can read it online, or download it in PDF form, and the code examples and solutions are available in this Github repo.
- Think Bayes. Downey writes that most books on Bayesian stats (statistics with probabilities applied to it, or updating your thinking when presented with new evidence or data) express their ideas in complex math notation or use calculus. This book expresses its ideas using Python and simpler discrete math approximations: “what would be an integral in a math book becomes a summation, and most operations on probability distributions are simple loops.” You can read it online, download it in PDF form, and the code examples are available in this Github repo and in this Jupyter notebook.
Bayesian Methods for Hackers
Bayesian Methods for Hackers is described as “an intro to Bayesian methods and probabilistic programming from a computation/understanding-first, mathematics-second point of view”, and its key chapters are available online, for free, in Jupyter notebook form. The method for reading it that the authors recommend is to clone the book’s Jupyter notebook repo and run it on your local machine.
The Python Data Science Handbook
Another Python/data science book in Jupyter notebook form! This one assumes that you’re familiar with Python, as it’s all about the libraries that are most used for data science and machine learning: NumPy, Pandas, Matplotlib, and Scikit-Learn.
R Programming for Data Science
This book brings the fundamentals of R programming to you, using the same material developed as part of the industry-leading Johns Hopkins Data Science Specialization. The skills taught in this book will lay the foundation for you to begin your journey learning data science.
…
This book is about the fundamentals of R programming. You will get started with the basics of the language, learn how to manipulate datasets, how to write functions, and how to debug and optimize code. With the fundamentals provided in this book, you will have a solid foundation on which to build your data science toolbox.
This book is available for free in PDF, EPUB, and MOBI formats (there’s a $20 suggested price, but you can pay what you want).
R for Data Science
This book will teach you how to do data science with R: You’ll learn how to get your data into R, get it into the most useful structure, transform it, visualise it and model it. In this book, you will find a practicum of skills for data science. Just as a chemist learns how to clean test tubes and stock a lab, you’ll learn how to clean data and draw plots—and many other things besides. These are the skills that allow data science to happen, and here you will find the best practices for doing each of these things with R. You’ll learn how to use the grammar of graphics, literate programming, and reproducible research to save time. You’ll also learn how to manage cognitive resources to facilitate discoveries when wrangling, visualising, and exploring data.
The Art of Data Science
This book writes down the process of data analysis with a minimum of technical detail. What we describe is not a specific “formula” for data analysis, but rather is a general process that can be applied in a variety of situations. Through our extensive experience both managing data analysts and conducting our own data analyses, we have carefully observed what produces coherent results and what fails to produce useful insights into data. This book is a distillation of our experience in a format that is applicable to both practitioners and managers in data science.
This book is available for free in PDF, EPUB, and MOBI formats (there’s a $20 suggested price, but you can pay what you want).
Jupyter rising
First of all, if you’re interested in a one-day conference that also gets you a chance to enjoy Florida’s warm winter and Disney World as well, check out DevFest Florida 2019. It takes place on Saturday, January 19, 2019, and I’ll be giving the Jumping into Jupyter Notebooks presentation, which will largely be a hands-on code-along-with-me exercise (or just watch, if you like) showing just what you can do with a Jupyter notebook, Python, and some data. If you know a little Python and are new to Jupyter notebooks, data science, or both, you’ll want to catch my presentation!
At this point, you might be asking “What are Jupyter Notebooks, anyway?”
Jupyter notebooks are a kind of computational notebook, a class of software that creates documents that mix:
- Stuff that you’d expect to find in a typical document, such text, pictures, and multimedia, and
- stuff that you wouldn’t expect to find in a typical document, such as code and its output.
To borrow a paragraph from an article that just appeared in nature, Why Jupyter is data scientists’ computational notebook of choice:
Jupyter is a free, open-source, interactive web tool known as a computational notebook, which researchers can use to combine software code, computational output, explanatory text and multimedia resources in a single document. Computational notebooks have been around for decades, but Jupyter in particular has exploded in popularity over the past couple of years. This rapid uptake has been aided by an enthusiastic community of user–developers and a redesigned architecture that allows the notebook to speak dozens of programming languages — a fact reflected in its name, which was inspired, according to co-founder Fernando Pérez, by the programming languages Julia (Ju), Python (Py) and R.
You may want to think of Jupyter notebooks as a wiki with a REPL. Its contents are divided into cells, which contain either:
- Narrative content, which you enter in Markdown, and
- Code — and if it runs, its output — which you can enter in Python or nearly four dozen other programming languages.
Jupyter notebooks’ format lends itself well to a number of research and educational uses. Once again, from the nature article:
Computational notebooks are essentially laboratory notebooks for scientific computing. Instead of pasting, say, DNA gels alongside lab protocols, researchers embed code, data and text to document their computational methods. The result, says Jupyter co-creator Brian Granger at California Polytechnic State University in San Luis Obispo, is a “computational narrative” — a document that allows researchers to supplement their code and data with analysis, hypotheses and conjecture.
For data scientists, that format can drive exploration. Notebooks, Barba says, are a form of interactive computing, an environment in which users execute code, see what happens, modify and repeat in a kind of iterative conversation between researcher and data. They aren’t the only forum for such conversations — IPython, the interactive Python interpreter on which Jupyter’s predecessor, IPython Notebook, was built, is another. But notebooks allow users to document those conversations, building “more powerful connections between topics, theories, data and results”, Barba says.
Researchers can also use notebooks to create tutorials or interactive manuals for their software. This is what Mackenzie Mathis, a systems neuroscientist at Harvard University in Cambridge, Massachusetts, did for DeepLabCut, a programming library her team developed for behavioural-neuroscience research. And they can use notebooks to prepare manuscripts, or as teaching aids. Barba, who has implemented notebooks in every course she has taught since 2013, related at a keynote address in 2014 that notebooks allow her students to interactively engage with — and absorb material from — lessons in a way that lectures cannot match. “IPython notebooks are really a killer app for teaching computing in science and engineering,” she said.
Ed. note: Before they were called Jupyter notebooks, they were called IPython notebooks.
Jupyter notebooks have recently received big boosts from big names. One of them is economist Paul Romer, who won the 2018 Nobel Prize in Economics — he’s a convert from Mathematica to Python and Jupyter notebooks:
Another big Jupyter booster isn’t from academia — it’s Netflix, where Jupyter notebooks are the most popular data tool:
Keep an eye on Jupyter notebooks. I’m pretty sure you’ll see them more often quite soon.
Getting into Jupyter notebooks
If you’re interested in trying them out, you may find these links handy:
DJ Patil’s code of ethics for data science
2.5 quintillion bytes of data are created every day. It’s created by you when you’re commute to work or school, when you’re shopping, when you get a medical treatment, and even when you’re sleeping. It’s created by you, your neighbors, and everyone around you. So, how do we ensure it’s used ethically?
Back in 2014, before I entered public service, I wrote a post called Making the World Better One Scientist at a Time that discussed concerns I had at the time about data. What’s interesting, is how much of it is still relevant today. The biggest difference? The scale of data and coverage of data has massively increased since then and with it the opportunity to do both good and bad.
…
With the old adage that with great power comes great responsibility, it’s time for the data science community to take a leadership role in defining right from wrong. Much like the Hippocratic Oath defines Do No Harm for the medical profession, the data science community must have a set of principles to guide and hold each other accountable as data science professionals. To collectively understand the difference between helpful and harmful. To guide and push each other in putting responsible behaviors into practice. And to help empower the masses rather than to disenfranchise them. Data is such an incredible lever arm for change, we need to make sure that the change that is coming, is the one we all want to see.
So how do we do it? First, there is no single voice that determines these choices. This MUST be community effort. Data Science is a team sport and we’ve got to decide what kind of team we want to be.
Should data scientists adhere to a Hippocratic oath?
From Wired (February 8, 2018):
THE TECH INDUSTRY is having a moment of reflection. Even Mark Zuckerberg and Tim Cook are talking openly about the downsides of software and algorithms mediating our lives. And while calls for regulation have been met with increased lobbying to block or shape any rules, some people around the industry are entertaining forms of self regulation. One idea swirling around: Should the programmers and data scientists massaging our data sign a kind of digital Hippocratic oath?
Microsoft released a 151-page book last month on the effects of artificial intelligence on society that argued “it could make sense” to bind coders to a pledge like that taken by physicians to “first do no harm.” In San Francisco Tuesday, dozens of data scientists from tech companies, governments, and nonprofits gathered to start drafting an ethics code for their profession.
The general feeling at the gathering was that it’s about time that the people whose powers of statistical analysis target ads, advise on criminal sentencing, and accidentally enable Russian disinformation campaigns woke up to their power, and used it for the greater good.
“We have to empower the people working on technology to say ‘Hold on, this isn’t right,’” DJ Patil, chief data scientist for the United States under President Obama, told WIRED. (His former White House post is currently vacant.) Patil kicked off the event, called Data For Good Exchange. The attendee list included employees of Microsoft, Pinterest, and Google.
Schaun Wheeler’s take on codes of data ethics: Not just unimplementable, but built on the wrong foundation
I’m still making up my mind about Schaun Wheeler’s contrarian take on codes of data ethics, but that may be colored by my dislike of Joel Grus’ dickishly libertarian “fuck your ethics” stance. I’ve included Wheeler’s take, along with his interview on Joel Grus’ podcast, Adversarial Learning, for the sake of completeness, with the caveat that I’m undecided on it.
On the difficulty of creating a data science code of ethics (Hackernoon, February 2, 2018):
dj patil recently wrote about the need for a code of ethics for data science. It’s not clear to me that data science as a profession is ready for a code of ethics. Codes are just words unless there is a mechanism to enforce sanctions against people who disregard those codes, and I’m pretty sure no single data science community is cohesive enough to enforce rules even for its own members.
An ethical code can’t be about ethics (Towards Data Science, February 6, 2018):
Last week, I wrote about my skepticism of Data for Democracy’s intent to create a data science code of ethics. My concerns focused on the practical feasibility of the project. After a lot of talking about, reading, and watching the evolution of the D4D code of ethics, I still believe the proposed principles are largely unactionable. I also believe, now, that what the working groups have produced is built on the wrong foundation entirely. This isn’t about iterating forward to a solution. No amount of revision can succeed if you’re building the wrong thing.
We need to be clear on what a code of ethics means. If we can realistically expect everyone in the community to just adopt a code of ethics because they intuitively feel that it’s the good and right thing to do, then the code of ethics is unnecessary — it amounts to nothing more than virtue signaling. If we can’t realistically expect complete organic adoption, then the code is a mechanism to coerce those who disagree with it, to censure people who don’t abide by it. Those two routes — wholesale freewill adoption or coercion — are the only two ways a code of ethics can actually mean anything.
Ed. note: Wheeler uses the phrase “virtue signaling” in this essay, a phrase that I think of the real-world equivalent of “Hail Hydra”: as a way for villains identify themselves to their comrades.
Can we be honest about ethics? (Hackernoon, March 5, 2018):
Ethics is not a solvable problem but it is a manageable risk. No set of principles, not even a robust legal and regulatory infrastructure, will ensure ethical outcomes. Our goal should be to ensure that algorithm design decisions are made by competent, ethical individuals — preferably, by groups of such individuals. If we improve competency, we improve ethics. Most ethical mistakes come from the inability to foresee consequences, not the inability to tell right from wrong.
An effective ethical code doesn’t need to — in fact, probably shouldn’t — focus on ethical issues. What matters most are the consequences, not the tools we use to bring those consequences about. As long as an ethical code stipulates ways individual practitioners can prove their competence by voluntarily taking on “unnecessary” costs and risks, it will weed out the less competent and the less ethical. That’s the list we should be building. That’s the product that will result in a more ethical profession.
My code of ethics will forbid YAML (Adversarial Learning podcast, May 25, 2018):
Joel Grus’ and Andrew K. Musselman’s podcast, Adversarial Learning, have Schaun Wheeler as a guest to talk about his stance on the proposed code of ethics. When listening to this episode, I couldn’t shake the feeling that I was listening to three smug white guys with (sometimes literally) no skin in the game.
Weapons of Math Destruction, by Cathy O’Neil
Worthwhile reading, or listening (the author reads the audiobook herself). I enjoyed it!
A former Wall Street quant sounds an alarm on the mathematical models that pervade modern life — and threaten to rip apart our social fabric
We live in the age of the algorithm. Increasingly, the decisions that affect our lives—where we go to school, whether we get a car loan, how much we pay for health insurance—are being made not by humans, but by mathematical models. In theory, this should lead to greater fairness: Everyone is judged according to the same rules, and bias is eliminated.
But as Cathy O’Neil reveals in this urgent and necessary book, the opposite is true. The models being used today are opaque, unregulated, and uncontestable, even when they’re wrong. Most troubling, they reinforce discrimination: If a poor student can’t get a loan because a lending model deems him too risky (by virtue of his zip code), he’s then cut off from the kind of education that could pull him out of poverty, and a vicious spiral ensues. Models are propping up the lucky and punishing the downtrodden, creating a “toxic cocktail for democracy.” Welcome to the dark side of Big Data.
deon: An ethics checklist for data scientists
From the deon site:
deon
is a command line tool that allows you to easily add an ethics checklist to your data science projects. We support creating a new, standalone checklist file or appending a checklist to an existing analysis in many common formats.
δέον • (déon) [n.] (Ancient Greek) wikitionary
Duty; that which is binding, needful, right, proper.
The conversation about ethics in data science, machine learning, and AI is increasingly important. The goal of
deon
is to push that conversation forward and provide concrete, actionable reminders to the developers that have influence over how data science gets done.
Here are the first two sections of the default checklist that deon generates:
Data Science Ethics Checklist
A. Data Collection
- A.1 Informed consent: If there are human subjects, have they given informed consent, where subjects affirmatively opt-in and have a clear understanding of the data uses to which they consent?
- A.2 Collection bias: Have we considered sources of bias that could be introduced during data collection and survey design and taken steps to mitigate those?
- A.3 Limit PII exposure: Have we considered ways to minimize exposure of personally identifiable information (PII) for example through anonymization or not collecting information that isn’t relevant for analysis?
B. Data Storage
-
B.1 Data security: Do we have a plan to protect and secure data (e.g., encryption at rest and in transit, access controls on internal users and third parties, access logs, and up-to-date software)?
-
B.2 Right to be forgotten: Do we have a mechanism through which an individual can request their personal information be removed?
-
B.3 Data retention plan: Is there a schedule or plan to delete the data after it is no longer needed?