4.3 Find Duplicates

Find Duplicates, under the Edit menu, is used to identify duplicate copies of sequences and other documents. Duplicates can be identified by sequence name, database ID (e.g. accession) or by the residues/bases, and the Search Scope can be set so that it checks within either a selected set of documents, all documents in a folder or in the sequences of a single alignment or sequence list.

When searching for duplicates within sequences of a single alignment or sequence list, two options are available for displaying results once the search has run:

If you are searching for duplicates within a folder or multiple select documents, you can choose to select either the most recently or least recently modified copy.

    Remove Duplicate Reads
Remove Duplicate Reads

For identifying non-exact duplicates, removing exact duplicates from large data sets, or removing duplicates on paired read data sets, use Remove Duplicate Reads... from the Sequence menu. This tool runs Dedupe from the BBTools suite.

For a detailed explanation of any Dedupe setting, hover the mouse over the setting, or click the help (question mark) button next to the custom options under More Options.