Chris Birchall

Author
+ Follow
since Dec 08, 2014
Merit badge: grant badges
For More
Cows and Likes
Cows
Total received
5
In last 30 days
0
Total given
0
Likes
Total received
5
Received in last 30 days
0
Total given
0
Given in last 30 days
0
Forums and Threads
Scavenger Hunt
expand Ranch Hand Scavenger Hunt
expand Greenhorn Scavenger Hunt

Recent posts by Chris Birchall

Looks like I'm a bit late to the party here, But I was interested by this part of the original post:

Mohamed Sanaulla wrote:I have been thinking of UI or End to End tests but then time taken to incorporate such an activity would be a critical concern.



Actually, writing UI tests is not as difficult and time-consuming as maybe you imagine. I often use Capybara to write UI tests, because I like the readability of its Ruby DSL. There's very little setup needed, and each test is only a few lines of code. If you're not a fan of Ruby, there's always Selenium or FluentLenium.

If you're working with legacy code and it's difficult to break dependencies and add unit tests, UI tests might be the best place to start. Once you have those in place, you can work your way in, writing component-level functional tests and eventually unit tests.

One common problem encountered when writing UI tests is that the HTML was not written with testing in mind, so it's difficult to find a particular element using XPath or CSS selectors. But this is easily fixed by sprinkling a few id attributes through the HTML.
I'm not planning to discuss DB refactoring specifically in the book, but in the chapter on full rewrites I will talk about how to mange the DB. e.g. sharing the DB with the existing application vs creating a new DB from scratch.

Usually I find it easier to leave the DB alone as much as possible, and instead put effort into writing a good data access layer that can hide the crazy details of the legacy DB from the rest of the application. One reason for taking this approach is that there often other systems (batches, reporting scripts, etc.) that also depend on the DB, making it difficult to change.
When you're doing a full rewrite, the biggest potential pitfall to be aware of is lack of consensus about the spec. Rewrites usually fall into one of three categories:

  • A simple port to a new technology stack. You want the new system to work exactly the same as the old one (i.e. the existing implementation is the spec)
  • You're using the rewrite as an oppurtunity to clean up and update the spec (in which case you will need somebody to decide the new spec and document it)
  • As well as rewriting the existing functionality, you're also implementing some new features. This is quite common, because you need to give the business side some value in return for letting you rewrite a legacy system.


  • First of all you need to make sure everybody (devs, managers and any other stakeholders) are in agreement about which category you're in. Also make sure this fact is documented clearly. Otherwise you'll have endless arguments about every aspect of the spec, with some people saying it should emulate the existing system's behaviour while others point out that that behaviour can be improved upon.

    You will also need to agree and document the scope of the project, in order to avoid feature creep. "Since we're taking the trouble to rewrite, we could also add a new feature here, and a new feature here, and ..." In the case of a simple porting project, feature creep is not usually a problem, but in the latter two cases, everybody needs to be clear on exactly what you're building, before you start the implementation. You don't need to decide the minute details up front; just a feature list is fine. Ideally you should also prepare a list of features that are OK to drop from the first release, in case the implementation takes longer than you expected.
    Rule #1 is: Don't badmouth the existing code. Just complaining about the poor quality of the code is not very helpful and it can be dangerous. I vividly remember saying to another developer, "this code appears to have been written by someone who doesn't understand the basic concepts of Java concurrency", only to find out later that it was written by the person I was talking to. *cringe*

    When trying to convince management/business side to allocate time for refactoring, you have to frame the discussion in terms they understand. As developers, the main reason we want to refactor is to make the code easier to work with, reason about and extend. In other words, to make our own jobs more enjoyable. Unfortunately, for the business as a whole, "happier developers" is not a primary goal. It doesn't directly generate value for the business. (Of course, having happier, more motivated employees has an indirect effect on the health and profitability of the company, but there's a long chain of cause and effect in between.)

    So try to come up with a way to express the benefits of refactoring in terms that management can understand. e.g. "Without refactoring, feature X will take N weeks to implement. If we spend a week on refactoring beforehand, feature X will take 2*N/3 weeks to implement, and future development will also be faster." Or "The web app currently has hundreds of XSS vulnerabilities, putting our users at risk. If we don't refactor and we instead try to fix them one by one, it'll take a year to fix them all, and we might accidentally miss some."
    My book doesn't really focus on code smells specifically.

    I didn't know about the book that Tushar linked to, but it looks really good. My recommendation: buy that!
    Atul, I'm glad to see you've noticed that the human side is often the most important thing to tackle when dealing with legacy code. This is a problem that most developers are reticent to discuss. Without consensus in the development team about how we should be writing and maintaining code, the legacy code will never be fixed.

    There's no easy solution to this problem, and fixing it will take a lot of time and patience. Try to think of yourself as a 'code quality evangelist', and make it your mission to get other developers interested in code quality and motivated to improve it.

    Often the problem stems from two causes: a lack of communication and a lack of motivation.

    Lack of communication: If developers are talking to each other regularly about the code, they are more likely to treat the code as something that they built together as a team, and something to take pride in. You can try a few things to get developers talking more, including code reviews, pair programming, daily stand-up meetings, dev lunches (where you get together and play with a new technology for an hour while eating lunch) and hackathons.

    Lack of motivation: If you're faced with a huge legacy codebase, it's easy to think that it's impossible to improve it. There's just so much code to be fixed! If you can find a way to split this huge, unsurmountable problem into small chunks, developers will be more motivated to start fixing things. They can fix one small thing at a time, and it's easy for them to see that they are making progress. Tools like FindBugs and SonarQube can help you to find 'hotspots' of poor quality in the codebase, making it easier to decide where to focus your refactoring efforts.
    First off, let me say that the Feathers book is AWESOME. I've been re-reading a lot recently as research for my own book. As Atul says, it was published 10 years ago, but I don't think it's showing its age at all. The principles it explains (such as finding a seam for introducing tests) are still valid, and the refactoring techniques are applicable to modern OO code.

    Please see my comment here for an explanation of how my book is different.
    Like the other commenters suggested, exploratory refactoring is the best way I know to find my way around a new codebase. Randomly choose an important-sounding file and open it up. Add comments, rename fields and methods to make them easier to understand, extract separate interfaces for each of the concrete class's concerns, ... and whatever else takes your fancy. I often start by giving classes and methods ridiculously long and descriptive names like processABatchOfItemsAndStoreTheResultInTheDBAndThenSendAnEmailAboutIt(). These names stand out in the code, acting as useful markers of code that should be refactored later.

    Most likely you're doing all this refactoring without any tests to back you up, so you might end up throwing it all away at the end for fear of regressions. This is not a problem at all, as the act of doing the refactoring will have been a valuable learning experience.

    It's also worth trying tools to generate a UML class diagram (most IDEs can do this for you). For a large codebase, an auto-generated UML diagram will probably be huge and chaotic, but it at least gives you a birds-eye view of how everything fits together. You could also auto-generate an ER diagram if the software uses an RDB.

    Another way to learn about a project that seems obvious but is often neglected: ask somebody! No matter how old a piece of software is or how many generations of developers it has passed through, there must be somebody out there who knows something about it. Ask around and gather any clues you can. This includes not only engineers but people on the business side as well. If it's an in-house app, you might be able to talk directly to the software's users. For example, a user might be able to tell you that a certain feature is obsolete and no longer needed, allowing you to delete a large chunk of dead code.
    Junilu, it might interest you to know that I live in Tokyo. Unfortunately I can't afford to eat at Jiro, but I've met my fair share of shokunin
    Short answer: all the time!

    Long answer: it depends what you mean by re-engineering.

    If we're talking about day-to-day refactoring, this should be done any time that you're writing code. It should become synonymous with coding, almost so you do it unconsciously. It should also be done as soon as possible after new code is written. The longer a piece of code lives, the more difficult it becomes to refactor.

    For more serious re-engineering work, such as replacing a whole component (e.g. the persistence layer of an application) or even a full rewrite, things are more complicated.

    Firstly I'd want to be sure that it's really in everybody's interest to take on such a large project. Is there long-term business value in making this change to the software, or are the developers just scratching an itch? You'll need buy-in from all the stakeholders if you want the re-engineering project to succeed. Otherwise you'll get halfway through, a 'more important project' will come along, and the project will be killed.

    Second, you need to choose a good time to start. If there's a lull in development coming up, that's a perfect opportunity. Freeze all development and focus entirely on the re-engineering effort. If the project is being actively developed and this kind of freeze is not possible, you can start anytime you like, but you'll have to re-engineer against a moving target. In this case, I would suggest splitting the re-engineering into many tiny phases, re-integrating with the master branch after each phase, so your re-engineering branch never strays too far from reality.

    In the case of a full rewrite, I would think long and hard before attempting it. I guarantee it will end up being a lot more work than you first imagine! I would only consider a full rewrite after you have thoroughly exhausted the avenue of refactoring, and after a 3rd-party solution has been ruled out. A complete paradigm shift (such as changing the implementation language or switching away from an obselete application framework) may also be a justification for rewriting.
    Hi Tanja,

    Wow, that sounds painful! The first thing I would do is add a thin wrapper around the code that runs your queries, that spits out the SQL of all queries to a log file. This will make things a lot easier than having to run the debugger every time. Once you have a wrapper layer in place, you could do other cool stuff like timing query execution and outputting warning logs for slow queries.

    Second, running SQL queries using raw strings instead of prepared statements is inviting SQL injections, so you should work to fix that asap. It would probably be difficult to jump straight from the current state to a full-blown ORM such as Hibernate, so I wouldn't recommend that. As a starting point, I would try implementing some helpers to take care of building PreparedStatements, with a nice fluent interface to make the code easier to read. I'm thinking something like:



    If you make this StatementBuilder immutable, you will be able to share a half-built SQL statement between different queries, thus reducing copy-pasted code.

    If you have to build a lot of complex WHERE clauses, you might want to create a Condition or Filter class that can take a StatementBuilder and return a new one with the appropriate WHERE clauses added. E.g.:



    Of course, any refactoring you do to the DB queries should be accompanied by tests. You will need to set up a test harness where you can run queries against a real DB, check that the results are what you expect, and setup and teardown the DB data in between tests. You might want to use an in-memory DB such as h2 or HSQLDB for easier setup and faster test execution.
    The aim of the book is quite an ambitious one - I want to cover everything (within reason) that you can do to make a legacy project better than it currently is. This includes refactoring and tests, but that's only a part of it. I try to cover the whole package, including: quality metrics and visibility; infrastructure automation; workflow and team communication; the decision about whether to rewrite or refactor; re-architecting; and various other miscellania.

    The book does have one large chapter devoted to refactoring and testing, but apart from that I've decided to leave it to the experts. Martin Fowler and Michael Feathers have already covered this topic in great detail, and there's not much more I can usefully add.
    Wow, so many great questions and discussions!

    I'm calling it a day for now, but I promise I'll get around to them all.
    Trying to define 'legacy' always reminds me of US Supreme Court Justice Potter Stewart discussing the definition of obscene material:

    I shall not today attempt further to define the kinds of material I understand to be embraced within that shorthand description, and perhaps I could never succeed in intelligibly doing so. But I know it when I see it



    Here's my stab at a definition: Legacy code is any code whose existence hinders any work performed upon it.

    We've all had that all-too-rare experience of working on excellent code. It's a joy to work with - it seems to positively invite that change that you want to implement, quietly guiding you towards the most elegant possible solution. The support framework is in place to allow you to add a test for your new code with a minimum of fuss, and the design gives you confidence that your change will not inadvertently affect the surrounding code. Working with this project reminds you why you chose to be a programmer in the first place!

    Unfortunately 99% of code that I have worked with (written either by me or by someone else) doesn't have this mystical property of "excellence". Every time you want to make a change, you find that some piece of existing code gets in the way somehow. Either it's impossible to write a test, or you can write a test but you can't test your change the way you wanted to. Perhaps you find that there's already code that does something very similar to what you want to do, but it's just coupled enough to other components to make it impossible to re-use.

    In other words, according to my definition above, the vast majority of code is legacy code, since it hinders you whenever you try to change it. And most of us are writing legacy code most of the time!
    I agree wholeheartedly! The sooner you refactor, the easier it is. Seriously.

    Before you write a single line of code, take some time to do a little 'mental refactoring'. Go through the design in your head, think about how easy it will be to understand, to test, to extend, to change in response to changing requirements. Think about how requirements are likely to change in the future. This is the easiest refactoring of all, because you don't even have to change any code!

    Refactoring is still pretty easy until you commit the code and share it with your co-workers. Take a moment to look through the code and refactor it for readability before you run that 'git push. Try to anticipate what the code reviewer is going to complain about, and fix it before they get the chance.

    After this point, refactoring becomes a little more tricky because somebody might start depending on the code you want to refactor. You'll need to let others know what you're doing, perhaps via code reviews.

    After that, as time passes, more and more code will end up depending on and coupling with your code, so it will become more and more difficult and risky to refactor. The moral of the story is, refactor as soon as you can!