Categories
Uncategorized

Know Your Cat Ports

cat_ports

For more comics like this, see www.slowwave.com.

Categories
Uncategorized

Windows Mobile Case Study: Porting Amplitude to WinMo

HTC phone with Amplitude on screen (simulated)

The Windows Mobile Blog points to an MSDN article covering how Amplitude, an application for the iPhone, was ported to Windows Mobile.

Here’s a quick description of Amplitude, which is developed by Gripwire, a mobile and social app company based in Seattle, courtesy of the Windows Mobile Blog:

Amplitude picks up any sound in a user’s surroundings through the microphone and then amplifies the sound, rendering it into a rich graphical representation on the device. Amplitude can be used to amplify any sounds, such as human or animal heartbeats, that usually wouldn’t be picked up by the human ear. Amplitude provides a cool user interface featuring an oscilloscope that allows users to view and visually quantify, signal voltages, as you can see the volume of the sound that you are listening to.

The MSDN article on the Amplitude porting project covers a lot of ground, including:

Whether you’re thinking of expanding your iPhone application to other platforms or starting a new Windows Mobile app project, you’ll find this case study packed with useful information and links. I’m going to expand on some of the topics covered in the article in future posts on this blog.

And don’t forget – there’s the Race to Market Challenge, in which you’re automatically entered whenever you submit a mobile app to Windows Marketplace for Mobile. Here’s a quick reminder of what Race to Market is all about:

Categories
Uncategorized

Science 2.0: How Computational Science is Changing the Scientific Method

This article also appears in Canadian Developer Connection.

Victoria Stodden speaking at the Science 2.0 conference    

Here’s the third in a series of notes from the Science 2.0 conference, a conference for scientists who want to know how software and the web is changing the way they work. It was held on the afternoon of Wednesday, July 29th at the MaRS Centre in downtown Toronto and attended by 102 people. It was a little different from most of the conferences I attend, where the primary focus is on writing software for its own sake; this one was about writing or using software in the course of doing scientific work.

My previous notes from the conference:

This entry contains my notes from Victoria Stodden’s presentation, How Computational Science is Changing the Scientific Method.

Here’s the abstract:

As computation becomes more pervasive in scientific research, it seems to have become a mode of discovery in itself, a “third branch” of the scientific method. Greater computation also facilitates transparency in research through the unprecedented ease of communication of the associated code and data, but typically code and data are not made available and we are missing a crucial opportunity to control for error, the central motivation of the scientific method, through reproducibility.  In this talk I explore these two changes to the scientific method and present possible ways to bring reproducibility into today’ scientific endeavor. I propose a licensing structure for all components of the research, called the “Reproducible Research Standard”, to align intellectual property law with longstanding communitarian scientific norms and encourage greater error control and verifiability in computational science.

Here’s her bio:

Victoria Stodden is the Law and Innovation Fellow at the Internet and Society Project at Yale Law School, and a Fellow at Science Commons. She was previously a Fellow at Harvard’s Berkman Center and postdoctoral fellow with the Innovation and Entrepreneurship Group at the MIT Sloan School of Management. She obtained a PhD in Statistics from Stanford University, and an MLS from Stanford Law School.

The Notes

  • Research has been how massive computation has changed the practice of science and the scientific method
    • Do we have new modes of knowledge discovery?
    • Are standards of what we considered knowledge changing?
    • Why aren’t researchers sharing?
    • One of my concerns is facilitating reproducibility
      • The Reproducible Research Standard
      • Tools for attribution and research transmission
  • Example: Community Climate Model
    • Collaborative system simulation
    • There are community models available
    • Built on open code, data
    • If you want to model something a complex as climate, you need data from different fields
    • Hence, it’s open
  • Example: High energy physics
    • Enormous data produced at LHC at CERN — 15 petabytes annually
    • Data shared through grid
    • CERN director: 10 – 20 years ago, we might have been able to repeat an experiment – they were cheaper, simpler and on a smaller scale. Today, that’s not the case
  • Example: Astrophysics
    • Data and code sharing, even among amateurs uploading their photos
    • Simulations: This isn’t new: even in the mid-1930s, they were trying to calculate the motion of cosmic rays in Earth’s magnetic field via simulation
  • Example: Proofs
    • Mathematical proof via simulation vs deduction
    • My thesis was proof via simulation – the results were not controversial, but the methodology was

Victoria Stodden and her "Really Reproducible Research" slide

  • The rise of a “Third Branch” of the Scientific Method
    • Branch 1: Deductive/Theory: math, logic
    • Branch 2: Inductive/Empirical: the machinery of hypothesis testing – statistical analysis of controlled experiments
    • Branch 3: Large-scale extrapolation and prediction – are we gaining knowledge from computation/simulations, or they just tools for inductive reasoning?
    • Contention — is it a 3rd branch?
      • See Chris Anderson’s article, The End of Theory (Wired, June 2008)
      • Systems that explain the world without a theoretical underpinning?
      • There’s the “Hillis rebuttal”: Even with simulations, we’re looking for patterns first, then create hypotheses, the way we always have
      • Steve Weinstein’s idea: Simulation underlies both branches:
        • It’s a tool to build intuition
        • It’s also a tool to test hypotheses
      • Simulations let us manipulate systems you can’t fit in a lab
    • Controlling error is central to scientific process

Victoria Stodden at Science 2.0 and her "Top reasons not to share" slide

  • Computation is increasingly pervasive in science
    • In the Journal of the American Statistical Association (JASA):
      • In 1996: 9 out of 20 articles published were computational
      • In 2006: 33 out 35 articles published were computational
  • There’s an emerging credibility crisis in computational science
    • Error control forgotten? Typical scientific computation papers don’t include code and data
    • Published computational science is near impossible to replicate
    • JASA June 1996: None of the computational papers provided any code
    • JASA June 2006: Only 3 out of the 33 computational articles made their code publicly available
  • Changes in scientific computation:
    • Internet: Communication of all computational research details and data is possible
    • Scientists often post papers but not their complete body of research
    • Changes coming: Madagascar, Sweave, individual efforts, journal requirements
  • A potential solution: Really reproducible research
    • The idea of an article as not being  the scholarship, but merely the advertisement of that scholarship
  • Reproducibility: can a member of the field independently verify the result?

Victoria Stodden at Science 2.0, with her "Controlling error" slide

  • Barriers to sharing
    • Took a survey of computational scientists
    • My hypotheses, based on the literature of scientific sociology:
      • Scientists are primarily motivated by personal gain or loss
      • Scientists are primarily worried about being “scooped”
  • Survey:
    • The people I surveyed were from the same subfield: Machine learning
    • They were American academics registered at a top machine learning conference (NIPS)
    • Respondents: 134 responses from 638 requests (23%, impressive)
    • They were all from the same legal environment of American intellectual property
  • Based on comments, it’s in the back of people’s minds
    • Reported sharing habits
      • 32% put their code available on the web
      • 48% put their data
      • 81% claimed to reveal their code
      • 84% said their data was revealed
      • Visual inspection of their sites revealed:
        • 30% had some code posted
        • 20% had some data posted
  • Preliminary findings:
    • Surprising: They were motivated to share by communitarian ideals
    • Surprising: They were concerned about copyright issues
  • Barriers to sharing: legal
    • The original expression of ideas falls under copyright by default
    • Copyright creates exclusive right of author to:
      • Reproduce work
      • Prepare derivative works
  • Creative Commons
    • Make it easier for artists to share and use creative works
    • A suite of licences that allows the author to determine the terms
    • Licences:
      • BY (attribution)
      • NC (non-commercial)
      • ND (no derived work)
      • SA (share-alike)
  • Open Source Software Licencing
  • Creative Commons follows the licencing approach used for open source software, but adapted for creative works
  • Code licences:
    • BSD licence: attribution
    • GPL: attribution and share-alike
  • Can this be applied to scientific work?
  • The goal is to remove copyright’s block to fully reproducible research
  • Attach a licence with an attribution to all elements of the research compendium

Victoria Stodden at the Science 2.0 conference and her "Real and Potential Wrinkles" slide

  • Proposal: Reproducible research standard
    • Release media components (text, data) under CC BY
    • Code: Modified BSD or MIT (attrib only)
  • Releasing data
    • Raw facts alone are generally not copyrightable
    • Selection or arrangement of data results in a protected compilation only if the end result is an original intellectual creation (US and Canada)
    • Subsequently qualified: facts not copied from another source can be subject to copyright protection
  • Benefits of RRS
    • Changes the discussion from "here’s my paper and results" to "here’s my compendium”
    • "Gives funders, journals and universities a “hook”
    • If your funding is public, so should your work!
    • Standardization avoids licence incompatibiltiies
    • Clarity of rights beyond fair use
    • IP framework that supports scientific norms
    • Facilitation of research, thus citation and discovery
  • Reproducibility is Subtle
    • Simple case: Open data and small scripts. Suits simple definition
    • Hard case: Inscrutable code; organic programming
    • Harder case: Massive computing platforms, streaming sensor data
    • Can we have reproducibility in the hard cases?
    • Where are acceptable limits on non-reproducibility?
      • Privacy
      • Experimental deisgn
    • Solutions for harder cases
      • Tools
  • Openness and Taleb’s criticism
    • Scientists are worried about contamination by amateurs
    • Also concerned about the “Prisoner’s dilemma”: they’re happy to share their work, but not until everyone else does
Categories
Uncategorized

Science 2.0: A Web Native Research Record – Applying the Best of the Web to the Lab Notebook

This article also appears in Canadian Developer Connection.

Cameron Neylon and his "Creative Commons" slide at Science 2.0

Intro

Here’s the second of my notes from the Science 2.0 conference, a conference for scientists who want to know how software and the web is changing the way they work. It was held on the afternoon of Wednesday, July 29th at the MaRS Centre in downtown Toronto and attended by 102 people. It was a little different from most of the conferences I attend, where the primary focus is on writing software for its own sake; this one was about writing or using software in the course of doing scientific work.

My previous notes from the conference:

This entry contains my notes from Cameron Neylon’s presentation, A Web Native Research Record – Applying the Best of the Web to the Lab Notebook.

Here’s the abstract:

Best practice in software development can save researchers time and energy in the critical analysis of data but the same principles can also be applied more generally to recording research process. Successful design patterns on the web tend to be those that successfully couple people into efficient information transfer mechanisms. Can we re-think the way we create, keep, and share our research records by using these design patterns to make it more effective?

Here’s Cameron’s bio:

Cameron Neylon is a biophysicist who has always worked in interdisciplinary areas and is a leading advocate of data availability. He currently works as Senior Scientist in Biomolecular Sciences at the ISIS Neutron Scattering facility at the Science and Technology Facilities Council. He writes and speaks regularly on the interface of web technology with science and is well-known as one of the leading proponents of open science.

The Notes

  • Feel free to copy and remix this presentation – it’s licenced under Creative Commons

 

  • What is the web good for?
    • Publishing
    • Subscribing
    • Syndicate
    • Remix, mash up and generally do stuff with
    • Collaborate
  • What do scientists do?
    • Publish
    • Syndicate (CRC books are a form of syndication)
    • Remix (take stuff from different disciplines — pull things to toghter, remix them
    • Validate
    • Collaborate
  • So, with this overlap, the web has solved science problems, right?
    • No — papers are dead, broken and disconnected
      • Papers don’t have links
      • The whole scientific record is fundamentally a dead document
    • The links between things make the web go round
    • I want to make science less like a great big monolithic document and make it more like a network of pieces of knowledge, wired together:
      • Fragments of science
      • Loosely coupled
      • Tightly wired

Cameron Neylon and his "Fragments of science / Loosely coupled / Tightly wired" slide at Science 2.0

  • What is a “fragment of science”?
    • A paper is too big a piece, even if it is the "minimal publishable unit"
    • A tweet is too small
    • A blog post would be the right size
  • His lab book is a collection of various electronic documents:
    • Excel files
    • Some basic version control
    • Data linked back to description of process used to create the data
    • As far as possible, the blogging is done automatically by machines
    • It doesn’t have to be complicated
  • [Shows a scatter plot, with each point representing an experiment]:
    • Can we tell an experiment didn’t work by its position on the graph?
    • We can tell which experiments weren’t recorded properly – they have no links to other experiments
  • The use of tagging and “folksonomies” goes some way, but how do you enforce it?
    • Tags are Inconsistent — not just between people, but even within a single person – you might tag the same thing differently from day to day
    • Templates create a virtuous circle, a self-assembling ontology
    • We found that in tagging, people were mixing up process and characteristics – this tells us something about the ontology process

Cameron Neylon and his "Physical objects / Digital objects" slide at Science 2.0

  • Put your data in external services where appropriate
    • Flickr for images
    • YouTube for video
    • RCSBPDB Protein Data Bank
    • Chemspider
    • Even Second Life can be used as a graphing medium!
    • All these services know how to deal with specific data types
  • Samples can be offloaded
    • LIMS, database, blogs, wiki, spreadsheet
    • Procedures are just documents
    • Reuse existing services
    • Semantic feed of relationships — harness Google: most used is the top result
  • Semantic web creates UI issues
    • Just trying to add meaning to results is one step beyond what scientists are expected to do
    • We need a collaborative document environment
    • The document environment must feel natural for people to work in
    • When they type something relevant, the system should realize that and automatically link it
    • We’re at the point where doc authoring systems can use regular expressions to recognize relevant words and autolink them

Cameron Neylon and his "Open" slide at Science 2.0

  • The current mainstream response to these ideas is:
    • The gamut from "You mean facebook?" to horror
    • I’m not worried about these ideas not getting adopted
  • Scientists are driven by impact and recognition
    • How do we measure impact?
      • Right now, we do this by counting the number of papers for which you’re an author
      • Most of my output is not published in traditional literature; it’s published freely on the web for other people to use
      • If they’re not on the web, they disappear from the net
      • The future measure of your scientific impact will be its effect on the global body of knowledge
      • Competition will drive adoption
Categories
Uncategorized

Barbara Liskov, Interviewed

This article also appears in Canadian Developer Connection.

Barbara Liskov The Interview

Over at the IT Manager Connection blog, there’s an interview with Barbara Liskov, who is:

  • The Ford Professor of Engineering at MIT’s Electrical Engineering and Computer Science Department
  • An Institute Professor at MIT
  • The first woman in the United States to earn a Ph.D. in computer science
  • An ACM Turing Award Recipient for both 2008 and 2009
  • An IEEE John von Neumann Medial Recipient for 2004
  • An ACM and American Academy of Arts and Sciences Researcher
  • …and most relevant to us, the “Liskov” in the Liskov Substitution Principle, one of the five SOLID principles for object-oriented design.

In the interview, Barbara talks about winning “the Nobel Prize of computing”, her vision for computing, what got her interested in computers, the challenges that the field still presents to minorities, the work she’s done and her thoughts on up-and-coming tech. If you’d like to listen, here’s the MP3 of Stephen Ibaraki interviewing Barbara Liskov. Stephen also wrote an article containing an abbreviated transcript that appears in IT Manager Connection. Enjoy!

The Liskov Substitution Principle

Small Liskov Substitution Principle poster

In case you’ve forgotten (or perhaps never learned), the Liskov Substitution Principle is:

If for each object o1 of type S there is an object o2 of type T such that for all
programs P defined in terms of T, the behavior of P is unchanged when o1 is
substituted for o2 then S is a subtype of T.

Well, duh. Who didn’t know that?

Object guru Robert C. “Uncle Bob” Martin took this bit of math nerd-speak and paraphrased in a way making it somewhat easier to follow:

Functions that use pointers or references to base classes must be able to use objects of derived classes without knowing it.

And because I’m nowhere near as smart as Uncle Bob, here’s the way I like to cover it:

If MySubclass is a subclass of MyClass, you should be able to replace instances of MyClass with MySubclass without breaking anything. Sort of like when they changed the actors who played "Darren" in Bewitched or "Becky" in Roseanne.

(Unlike Liskov or Martin, I don’t have to write academic papers, so I can get away with making references to old TV shows.)

As I mentioned earlier, I’ll be writing more about the SOLID principles. Watch this space!

Categories
Uncategorized

TechDays 2009 Sessions Announced, and Other News

Microsoft TechDays Canada 2009: 2 days - 7 cities - 5 tracks - 40 sessions - plus more!

Developer Sessions at TechDays

The sessions for TechDays 2009, Microsoft’s cross-Canada conference taking place in seven cities this fall, have been posted on the TechDays site. You can go there to see the full set of sessions, or you can check the table below to look at the sessions for the tracks related to software development.

I’m the lead for the Developing for the Microsoft-Based Platform track and John Bristowe is lead for the Developer Fundamentals and Best Practices track. John and I picked the best developer-focused sessions from this year’s TechEd conference and put them into our tracks. We’ve also chosen speakers for each session in each of TechDays’ seven cities, going for local developers wherever possible. TechDays features international conference material and local speakers, right near where you live. We’re not just expanding your knowledge, we’re stretching your dollar, too!

And now, the developer sessions…

Track: Developing for the Microsoft-Based Platform

Track: Developer Fundamentals and Best Practices

Learning key skills to develop rich client and web-based applications on the Microsoft-based platform is what this track is all about. In this track you will learn how to develop rich, interactive and interoperable applications for both the client and the web using our newest tools and frameworks. You’ll learn how to build software that helps to give your users the best experience possible, whether it’s a program running on Windows 7, a website built on ASP.NET MVC or a Silverlight-based rich internet application. You’ll also learn how to build services that can deliver data to almost any platform and internet-enabled device. And finally, you’ll learn how to build these software and services in ways that are modular and maintainable. This track is all about taking your skills up a notch while at the same time ensuring effective and efficient interaction with all members of the development team from IT architect, to developer, to tester. You will learn about the importance of Application Lifecycle Management (ALM) and how to leverage the Visual Studio development platform to streamline your efforts. You will learn some best practices from industry professionals while building upon your technical foundation.

Day One: Front End – User Interface and Experience

Day One: Core Fundamentals and Best Practices

Day 1, Session 1:
What’s New in Silverlight 3

Rich internet applications just got richer! Silverlight 3 is packed with new features and improvements that your users will notice, from pixel shaders to perspective 3D to animation enhancements to bitmap APIs to HD video. We think you’ll also be impressed by the features for developers, such as the updated style model, data binding improvements, better resource handling, and a tuned-up Web services stack. In this session, we’ll explore new features of Silverlight 3 as we build a Silverlight-based application using Expression Blend 3 and Visual Studio.

Day 1, Session 1:
Tips and Tricks for Visual Studio

This session enhances your experience with Visual Studio. Keyboard shortcuts, macros, layouts, fonts, tools, and external utilities are all very powerful and underused features of Visual Studio. This session makes you more productive in Visual Studio. Bring your pen and pad because you’ll definitely want to take notes!

Day 1, Session 2:
Expression Blend for Developers

Not a designer? Overwhelmed by Expression Blend? Not a problem! We’ll show you how to use Expression Blend to create advanced and polished user interfaces for business applications, consumer applications, multimedia projects, games or anything in between. We’ll cover features of Expression Blend from a developer’s perspective and show how it works in tandem with Visual Studio throughout the development process. You’ll learn how to create professional-looking user interfaces and visual elements – even if you don’t think of yourself as an interface designer.

Day 1, Session 2:
Test Driven Development Techniques

In recent empirical study from Microsoft Research, four case studies were conducted and the results indicated that by using Test-Driven Development (TDD) the pre-release bugs decreased by 40-90% relative to similar projects that did not use TDD. Subjectively, the teams experienced a 15-35% increase in initial development time after adopting TDD. In this session learn some of the key techniques for effectively using TDD to drive the creation of better software, reduce the defect density in projects, and help improve overall productivity

Day 1, Session 3:
Building Modular Applications Using Silverlight and WPF

How do you build extensible and maintainable line-of-business applications in Silverlight and Windows Presentation Foundation (WPF)? How do you design and code to handle real-world complexity? Composite Application Guidance (a.k.a. "PRISM") offers guidance, libraries and examples – in small, free-standing, digestible chunks – that you can use to build applications with rich user interfaces that are also easier to maintain and extend. You’ll learn how to compose complex UIs from simpler views, integrate loosely coupled components with "EventAggregator" and "Commands", develop independent modules that can be loaded dynamically, and share code between Silverlight and WPF clients.

Day 1, Session 3:
Patterns for the Rest of Us

Patterns. Patterns. Patterns. You hear them everywhere. We’re told to use them and call them by names, as if the pattern is a colleague of ours. Hey, did you see Observable Pattern in the demo this morning? If you feel left out in conversations where Pattern buzzwords are thrown around, this session is for you. This session introduces Patterns with imagery, code, terms, and fun and games to help you better understand and remember pattern usage.

Day 1, Session 4:
Optimizing Your Apps for the Windows 7 User Experience

This session will show you the Windows 7 APIs that will let your applications – and your users – get the full Windows 7 experience. Learn about new extensibility methods to surface your application’s key tasks. Discover how enhancements to the taskbar, Start Menu, thumbnails, desktop elements, the Scenic Ribbon, Federated Search and Internet Explorer 8 provide new ways for you to delight your users and help make them more productive. If you want to give your users the best Windows 7 experience, this session is for you!

Day 1, Session 4:
A Strategic Comparison of Data Access Technologies from Microsoft

Thanks to recent innovations from Microsoft including LINQ, the Entity Framework and ADO.NET Data Services, choosing a technology for data access architecture has become a subject for debate. Among other things, developers must balance productivity, elegance, and performance. Some common questions include: Are data readers and data sets still useful? How should I choose between LINQ and Entity Framework models? Should I design custom entities or use types that follow the database schema? Should I use ADO.NET Data Services to expose my data model or control access via Windows Communication Foundation (WCF) business services? This session looks at data access architecture for each of these technologies, illustrates common practices when employing each, discusses pros and cons, and helps you better understand how to choose the right technology for your scenario.

Day Two – Back End: Programming Frameworks and Principles

Day Two – Team System Fundamentals and Best Practices

Day 2, Session 1:
Introducing ASP.NET MVC

You’ve probably heard the buzz about Model-View-Controller (MVC) web frameworks. They’re all the rage because they combine speed, simplicity, control…and fun. ASP.NET MVC is Microsoft’s MVC web framework, and in this session, we’ll talk about the MVC pattern, explain the ideas behind ASP.NET MVC and walk through the process of building an application using this new web framework. We’ll also cover several techniques to get the most out of ASP.NET MVC and deliver web applications quickly and with style.

Day 2, Session 1:
Practical Web Testing

This session is about looking at the past, present, and future of Web testing. We begin by looking at how Web testing was accomplished before the arrival of Microsoft Visual Studio Team System. Next, you will learn about the Web and load testing tools available in Visual Studio Team System 2005/2008.

Day 2, Session 2:
SOLIDify Your Microsoft ASP.NET MVC Applications

Object-oriented programming makes it easier to manage complexity, but only if you do it right. The five SOLID principles of class design (one for each letter) help ensure that you’re writing applications that are flexible, comprehensible and maintainable, and we’ll explain and explore them in this session. We’ll start with a brittle ASP.NET MVC application that’s badly in need of refactoring and fix it by applying the SOLID principles. This session is a good follow-up for Introducing ASP.NET MVC, but it’s also good for developers of ASP.NET MVC looking to improve their code – or even if you’re not planning to use ASP.NET MVC. The SOLID principles apply to programming in any object-oriented language or framework.

Day 2, Session 2:
Better Software Change and Configuration Management Using TFS

A critical factor in getting the most out of Team Foundation Server is understanding the version control and build systems. In this session, learn how use Team Build and Team Foundation Server Version Control to effectively manage concurrent development branches. Learn about how set up your repository structure and how to define builds. Learn about different branching techniques like branch by feature and branch for release. Learn how builds help you find what has changed in branches and how to manage releases, service packs, and hot fixes. Attend this session to see how the API can help create better release documentation and get you out the door sooner.

Day 2, Session 3:
Building RESTful Services with WCF

REST (REpresentational State Transfer) is an architectural style for building services, and it’s the architectural style of the web. It’s been popular outside the world of Microsoft development for a long time, but it’s quickly becoming the de facto standard inside as well. Windows Communication Foundation (WCF) makes it simple to build RESTful web services, which are easy to use, simple and flexible. In this session, we’ll cover the basics of REST and the show you how to build REST-based, interoperable web services that can be accessed not just by Microsoft-based web and desktop applications, but anything that can communicate via HTTP from an Ajax client to a feed readers to mobile device to applications written using other languages and frameworks such as PHP, Python/Django or Ruby/Rails.

Day 2, Session 3:
Metrics That Matter: Using Team System for Process Improvement

Process improvement without adequate metrics is shooting in the dark — you might hit your target, but it’s impossible to aim and difficult to determine how close you were to hitting your goal. In this session we look at how Microsoft Visual Studio Team System collects data, and how we can modify our process to collect the right data. Then we talk about several candidate metrics (top ten key metrics) that many real-world organizations have used to achieve real improvements and help get an excellent return on investment in Team Foundation Server implementation. We frame the discussion and demos around using a process improvement effort (either formal or informal) to help your Team System implementation get you the ROI you deserve!

Day 2, Session 4:
Developing and Consuming Services for SharePoint

The world gets more service-oriented every day, and with that comes the demand to integrate all kinds of services, including those from SharePoint. This session introduces SharePoint as a developer platform and provides an overview of how you can build and deploy custom services with it. The focus will be on developing ASP.NET and Windows Communication Foundation services for SharePoint as well as building a Silverlight client to consume them.

Day 2, Session 4:
Database Change Management with Team System

If you develop database enabled applications on top of SQL Server, you owe it to yourself to considering doing it better with Visual Studio Team System. In this session, you’ll learn about changes to how the product works under the covers and what that means to you. Then, you’ll learn how to use the product to design, build, and deploy your databases to development, test, and production environments — all with purpose and method instead of the more traditional madness that can be found in many shops in the wild

Free TechNet Plus Subscription for TechDays Attendees

Your admission to TechDays gets you more than just two days’ worth of conference and networking. We’re also putting together a package of goodies that you can use long after we’ve turned out the lights at the last TechDays venue.

One such goodie is a full year’s subscription to TechNet Plus, the Microsoft IT pro resource that gives you, among other things, full, non-time-limited versions of operating systems, servers and Office System software for evaluation (non-production) use. It also gives you access to pre-release versions, a full technical infromation library, two free tech support calls, and more, It’s a US$349 value that you get for free if you attend TechDays.

More Than Just a Conference

In addition to coming to a city near you to hold TechDays, we’re planning activities for each city in our tour – things like user group events, academic events, Coffee and Code and more! Watch this blog for announcements for your city.

We also have some surprises in store, and we’ll announce them…soon.

Register at the Early Bird Price

You could pay the full price of CDN$599 if you really wanted to. We think that you’d rather save a whole $300 and pay just CDN$299. The early bird price for any of the TechDays cities is available only until 6 weeks before that city’s conference, and the Vancouver and Toronto conferences are happening in September. Procrastinate at your peril – register now!

Categories
Uncategorized

Science 2.0: Choosing Infrastructure and Testing Tools for Scientific Software Projects

Titus Brown at the podium at MaRSC. Titus Brown delivering his presentation.

Here’s the first of my notes from the Science 2.0 conference, a conference for scientists who want to know how software and the web is changing the way they work. It was held on the afternoon of Wednesday, July 29th at the MaRS Centre in downtown Toronto and attended by 102 people. It was a little different from most of the conferences I attend, where the primary focus is on writing software for its own sake; this one was about writing or using software in the course of doing scientific work.

This entry contains my notes from C. Titus Brown’s presentation, Choosing Infrastructure and Testing Tools for Scientific Software Projects. Here’s the abstract:

The explosion of free and open source development and testing tools offers a wide choice of tools and approaches to scientific programmers.  The increasing diversity of free and fully hosted development sites (providing version control, wiki, issue tracking, etc.) means that most scientific projects no longer need to self-host. I will explore how three different projects (VTK/ITK; Avida; and pygr) have chosen hosting, development, and testing approaches, and discuss the tradeoffs of those choices.  I will particularly focus on issues of reliability and reusability juxtaposed with the mission of the software.

Here’s a quick bio for Titus:

C. Titus Brown studies development biology, bioinformatics and software engineering at Michigan State University, and he has worked in the fields of digital evolution and physical meteorology. A cross-cutting theme of much of his work has been software development for computational science, which has led him to software testing and agile software development practices. He is also a member of Python Software Foundation and the author of several widely-used Python testing toolkits.

  • Should you do open source science?
    • Ideological reason: Reproducibility and open communication are supposed to be at the heart of good science
    • Idealistic reason: It’s harder to change the world when you’re trying to do good science and keep your methods secret
    • Pragmatic reason: Maybe having more eyes on your project will help!
  • When releasing the code for your scientific project to the public, don’t worry about which open source licence to use – the important thing is to release it!
  • If you’re providing a contact address for your code, provide a mailing list address rather than your own
    • It makes it look less “Mickey Mouse” – you don’t seem like one person, but a group
    • It makes it easy to hand off the project
    • Mailing lists are indexed by search engines, making your project more findable
  • Take advantage of free open source project hosting

 

  • Distributed version control
    • “You all use version control, right?” (Lots of hands)
    • For me, distributed version control was awesome and life-changing
    • It decouples the developer from the master repository
    • It’s great when you’re working away from an internet connection, such as if you decide to do some coding on airplanes
    • The distributed nature is a mixed mixed blessing
      • One downside is "code bombs", which are effective forks of the project, created when people don’t check in changes often enough
      • Code bombs lead to complicated merges
      • Personal observation: the more junior the developer, the more they feel that their code isn’t “worthy” and they hoard changes until it’s just right. They end up checking in something that’s very hard to merge
    • Distributed version control frees you from permission decisions – you can simply say to people who check out your code "Do what you want. If I like it, I’ll merge it."

 

  • Open source vs. open development
    • Do you want to simply just release the source code, or do you want participation?
      • I think participation is the better of the two
    • Participation comes at a cost, in both support time and attitude
      • There’s always that feeling of loss of control when you make your code open to use and modification by other people
      • Some professors hate it when someone takes their code and does "something wrong" with it
      • You’ll have to answer “annoying questions” about your design decisions
      • Frank ("insulting") discussion of bugs
      • Dealing with code contributions is time-consuming – it takes  time to review them
    • Participation is one of the hallmarks of a good open source project

 Slide: "The Stunning Realization"

  • Anecdote
  • I used to work on the “Project Earthshine” climatology project
    • The idea behind the project was to determine how much of the sunlight hitting the Earth was being reflected away
    • You can measure this be observing the crescent moon: the bright part is lit directly by the sun; the dark part is also lit – by sunlight reflected from the Earth
    • You can measure the Greenhouse Effect this way
    • It’s cheaper than measuring sunlight reflected by the Earth directly via satellite
  • I did this work at Big Bear Lake in Califronia, where they hung telescopes to measure this effect at solar observatories
  • I went through the the source code of the application they were using, trying to figure out what grad student who worked on it before me did
  • It turned out that to get “smooth numbers” in the data, his code applied a correction several times
  • His attitude was that there’s no such thing as too many corrections
  • "He probably went on to do climate modelling, and we know how that’s going"
  • How do we know that our code works?
    • We generally have no idea that our code works, all we do is gain hints
    • And what does "works" mean anyway, in the context of research programming? Does it means that it gives results that your PI expects?
  • Two effects of that Project Earthshine experience:
  • Nowadays, if I see agreement between 2 sources of data, I think at least one of them must be wrong, if not both
  • I also came to a stunning realization that:
    • We don’t teach young scientists how to think about software
    • We don’t teach them to be suspicious of their code
    • We don’t teach them good thought patterns, techniques or processes
    • (Actually, CS folks don’t teach this to their students either)
  • Fear is not a sufficient motivator: there are many documented cases where things have gone wrong because of bad code, and they will continue to do so. Famous cases include:
  • If you’re throwing out experimental data because of ifs lack of agreement with your software model, that’s not a technical problem, that’s a social problem!

 

  • Automated testing
    • The basic idea behind automated testing is to write test code that runs your main code and verifies that the behaviour is expected
    • Example – regression test
      • Run program with a given set of parameters and record the output
      • At some later time, run the same program with the same parameters and record the output
      • Did the output change in the second run, and if so, do you know why?
      • This is different thing from "is my program correct"
      • If results change unintentionally, you should ask why
    • Example – functional test
      • Read in known data
      • Check that the known data matches your expectations
      • Does you data loading routine work?
      • It works best if you also test with "tricky" data
    • Example – assertions
      • Put "assert parameter >=0" in your code
      • Run it
      • Do I ever pass garbage into this function?
      • You’ll be surprised that things that "should never happen", do happen
      • Follow the classic Cold War motto: “Trust, but verify”
    • Other kinds of automated testing (acceptance testing, GUI testing), but they don’t usually apply to scientists
    • In most cases, you don’t need to use specialized testing tools
    • One exception is a code coverage tool
      • Answers the question “What lines of code are executed?”
      • Helps you discover dead code branches
      • Guide test writing to untested portions of code
    • Continuous integration
      • Have several "build clients" building your software, running tests and reporting back
      • Does my code build and run on Windows?
      • Does my code run under Python 2.4? Debian 3.0? MySQL 4?
      • Answers the question: “Is there a chance in hell that anyone else can use my code?”
    • Automated testing locks down "boring" code (that is, code you understand)
      • Lets you focus on "interesting" code – tricky code or code you don’t understand
      • Freedom to refactor, tinker, modify, for you and others

C. Titus Brown delivering his presentation at MaRS 

  • If you want to suck people into your open source project:
    • Choose your technology appropriately
    • Write correct software
    • Automated testing can help
  • Closed source science is not science
    • If you can’t see the code, it’s not falsifiable, and if it’s not falsifiable, it’s not science!