21 May 2012

Improve Code Reuse with Search

In past, I have been critical of simply crawling content with a search engine and using the basic keyword query approach to find things.  However, there are some situations that do lend themselves to just this approach (though with a tad bit of “intelligence”).  One of these situations is code reuse. 

At Consejo, we use Subversion (coupled with VisualSVN Server) for source control.  Subversion (or SVN) is an open-source source control system. VisualSVN Server provides a terrific web interface to the repository.  Combined with the VisualSVN plugin for Visual Studio, we are able to very effectively manage our code production across clients with little or no cost.  Unfortunately, with our team dispersed, it’s sometimes difficult to make everyone aware of what’s already been built for one client or another.

SIDE NOTE: I have personally been critical of using open source solutions and even wrote an article arguing against the open source model.  However, it’s clear there are cases where open source makes sense (yup I was wrong).  Further, we’ve been a financial supporter of SVN (through donations) since we started using the tool regularly.

Much of what consulting companies do is based on past work.  It’s critical to how we work that every consultant is kept abreast of what the firm is doing, across industries, disciplines and clients.  For example, we recently built two different applications, for two very different clients.  Both applications, however, used the same licensed interface controls.  In fact, many of the interactions each application’s user base had with their respective application were very similar.  As a result, techniques we used, problems we solved and utility code (code not specific to a client or application) we constructed could be leveraged across both projects with minor updates.  This saved both our consultants and our clients time and, more importantly, money.   Unfortunately, since both projects overlapped, there was no a good way, besides knowing team members on both projects, to capture and surface code reuse opportunities in an automated way.  Each team has to basically know what the other team was doing and ask specific questions (or discover reuse opportunities through happenstance).  Not a fabulous nor scalable model. 

While we solved this problem the “old fashioned” way, it’s not a good long-term solution.  As projects get more complicated, team members more geographically or temporally dispersed, the “old fashioned” way becomes very burdensome and super inefficient.   So what’s the solution?

Historically, we’d been looking for ways to effectively expose our SVN repository to our consultants, outside of Visual Studio and a web browser.  The thought was that if developers were able to search for specific code constructs or even development pattern names, there was a reasonable chance of finding reusable code snippets or whole libraries (if they existed).   And, while there are a few tools that help in this regard like FishEye and SvnQuery, neither was a perfect fit for our needs.  Since VisualSVN Server presents the repository as a web site, why couldn’t we just use a standard search engine like MS Search Express, SharePoint or GSA (Google Search Appliance) to crawl the repository and allow consultants to query it like a web site? 

The answer, as Kenneth Scott points out in his blog post on crawling VisualSVN with a search engine, is “not exactly.”  The problem stems from the way VisualSVN renders the repository web interface (you should read his post for the details).  However, Kenneth solved this problem through the use of an HTTP handler and gave us exactly what we needed.  Using his utility, we can indeed use a standard search engine (take your pick) and then allow our developers to search for an example of an excel-like editing experience using a Telerik GridControl or discover if we’ve built a custom authentication provider for SharePoint; both of which we’ve done, but only one I knew about until recently.

You may still be questioning why this is a good idea?  I’ve been critical of this kind of shot gun search approach in the past.  Why should this work now?  The reason is the very narrow information domain and the very specific terms developers use.  Both work together in a way that makes finding content, using straight keyword queries, more reasonable.  For example, if I want to find example of our use of the Telerik GridControl, I can simply search for the RadGrid class name.  If I want to figure out if we’d built a .NET membership provider, I can search for the inheritance statement.  In the first case, I may get an overwhelming number of results.  However, they’ll all be examples of use, since the only time that class would appear is when I’m using it in code (in other words, relevant).  The second example would produce far fewer results and likely give me exactly what I need immediately (and relevant and, probably, highly precise).

In the end, I’m only really disappointed that I didn’t discover Kenneth’s blog post sooner, but my past search queries were about SVN or Subversion, not VisualSVN (poor search strategy on my part).  However, as we develop this code search feature inside of our intranet, I’m excited by the prospect of finding internal examples of code we can reuse.

If you have a similar feature inside your firm, I’d love to hear how it works and if it’s yield higher code reuse.