Suvarna Garge (Editor)

Diff Text

Updated on
Edit
Like
Comment
Share on FacebookTweet on TwitterShare on LinkedInShare on Reddit
Developer(s)
  
DiffEngineX LLC

Written in
  
C-sharp

Type
  
Data comparison

Initial release
  
Oct 29, 2012

Operating system
  
Any

License
  
Closed Source

Diff-Text

Diff-Text is a free software tool which finds the differences between two blocks of plain text. It takes the form of a collection of web-pages, each one with a slightly different layout. Text to be compared is pasted directly into the web-page. It can be used from any operating system.

Contents

Diff-Text was developed by DiffEngineX LLC and uses improved algorithms originally developed for the spreadsheet compare tool DiffEngineX

It allows the user to choose between comparing at the level of whole lines (or paragraphs), words or characters. If comparing whole lines only the fact that a line is not in the other block will be reported. Diff Text considers a paragraph to be any line ending with a Windows, Macintosh or Unix line terminator.

The website can combine the original and modified text blocks into one pane with all differences highlighted. Alternatively the marked-up original and modified text blocks can be displayed in individual panes.

Navigation from one difference to the next is supported.

All of the above features are not unique and can be found in other text comparison tools.

The software can display just the differences, the differences with a variable amount of context on either side or the whole marked-up text.

The website supports the use of https/SSL (Secure Sockets Layer) so confidential text can be compared.

The algorithm used by Diff Text is used by Selection Diff Tool, which is an app for Microsoft Word and Excel 2013.

Limitations Of Using The Longest Common Subsequence Algorithm

The unique feature of Diff-Text is its ability to spot text that has either been moved up or down in the document and placed into a new context. To avoid spurious similarities being flagged, the software allows the user to specify the minimum number of adjacent words or characters to be reported as a move. Text movements are reported such that the number of individual edits to transform the original text into the modified text are at a minimum.

The vast majority of text comparison software based on the longest common subsequence problem algorithm incorrectly report moved text as unlinked additions and deletions. The algorithm only reports the longest in-order run of text between two documents. Text moved out of the longest run of similarities is missed.

Heuristics are not used. Any similarity between the two documents above the specified minimum will be reported (if detecting moves is selected). This is the main difference between Diff Text and most other text comparison algorithms. Diff Text will always match up significant similarities even if contained within non-identical or moved lines. It never resorts to guessing or the first match that happens to be found, which may result in non-optimal matches elsewhere.

Not only can Diff-Text spot whole paragraphs that have been moved up or down in a document, it can spot sentence re-ordering within a paragraph. To indicate this the background color of the text changes to light blue and yellow.

If the user specifies text movements should not be detected, its algorithm runs in (m log n) time, which is an improvement from the standard quadratic time often seen in software of this type. m and n refer to the sizes of the original and modified texts.

Summary

Conventional text comparison tools based on the longest common subsequence problem algorithm can potentially miss a lot of similarities between original and modified files, if blocks of text are moved around. Diff-Text is systematic and allows the user to specify the minimum number of contiguous words or characters to be considered a valid move.

References

Diff-Text Wikipedia