Ask questions via twitter! Message any question to @answers on twitter. We'll publish the question and send you a reply each time there's a new answer.
Next Question

Answered Question

 
M$3 September 01, 2009 10:32 PM

In excel, how can you eliminate rows based on duplicate values in one column?

I have an excel sheet with multiple columns that has one column with unique identifying numbers. Several of the rows are duplicates and can be identified by having the same number in that unique identifying column. I want to eliminate the duplicate rows.

I have tried using advanced filter, but that doesn't quite do the trick.

Thanks!
Interesting Question?  Yes (0)   No (0)   
RSS
 
 

Best Answer  Chosen by Asker

 
September 01, 2009 11:26 PM
It's very easy and I have used this method many a time. Here is how from Microsoft website:

If you are using Microsoft Office Excel 2003/Microsoft Excel 2002

"A duplicate row (also called a record) in a list is one where all values in the row are an exact match of all the values in another row. To delete duplicate rows, you filter a list for unique rows, delete the original list, and then replace it with the filtered list. The original list must have column headers.

Caution Because you are permanently deleting data, it's a good idea to copy the original list to another worksheet or workbook before using the following procedure.

1. Select all the rows, including the column headers, in the list you want to filter. Click the top left cell of the range, and then drag to the bottom right cell.

2. On the Data menu, point to Filter, and then click Advanced Filter.

3. In the Advanced Filter dialog box, click Filter the list, in place.

4. Select the Unique records only check box, and then click OK.
The filtered list is displayed and the duplicate rows are hidden.

5. On the Edit menu, click Office Clipboard.
The Clipboard task pane is displayed.

6. Make sure the filtered list is still selected, and then click Copy .
The filtered list is highlighted with bounding outlines and the selection appears as an item at the top of the Clipboard.

7. On the Data menu, point to Filter, and then click Show All.
The original list is re-displayed.

8. Press the DELETE key.
The original list is deleted.

9. In the Clipboard, click on the filtered list item.
The filtered list appears in the same location as the original list. "

Check this link :

http://office.microsoft.com/en-us/excel/HA010346261033.aspx?pid=CL100570551033

The page also has many other tips to remove duplicates in coumns, etc.

If you are using Office 2007, I believe it has direct tool to remove duplicates under Data menu:

http://www.flickr.com/photos/tyler/156760775/
Asker's Rating:
• Answer is in the comments - thanks!


Tags: excel, microsoft

Helpful Answer?  (0)   (0)    Tip stanar for this answer
Permalink | Report
   Reply  
 
 
 
September 02, 2009 12:07 AM
Thanks for the very thorough answer. Unfortunately, the rows arent *exact* duplicates, so the filtering doesn't work. I had tried the advanced filter (like I posted earlier), but the *only* part the designates a record as a duplicate is the unique identifier.

I'd need a way, basically, to designate the rows where the unique identifier appears for the second, third, fourth time.

Report
 
 
 
September 02, 2009 12:21 AM
I missed that line where you mentioned about the "advanced filter".

I just tried with a sample data based on your description and it works.

raw data

http://farm3.static.flickr.com/2615/3879940004_dbe8daf70e_m.jpg

filtered data
http://farm3.static.flickr.com/2601/3879143221_75f188ec8d_m.jpg

If you are talking about this kind of duplicates
http://farm3.static.flickr.com/2495/3879170855_2eb1d53931_m.jpg

then "Autofilter" under Data, Filter can be used. select the unique identifier as the filter and delete any rows that you don't want.

http://farm4.static.flickr.com/3470/3879967688_2e4abdae8c_m.jpg

if you have data something like this:
http://farm4.static.flickr.com/3508/3879245559_24a2d9306c_m.jpg

you can use "adv filter": after selecting data range without the unique column (if it is ok).

http://farm3.static.flickr.com/2622/3880042688_2525cf5bbe_m.jpg

hope this helps, if not can you post here a screen shot of a sample data (similar looking ) with a few rows?

Report
 
 
 
September 02, 2009 01:00 AM
In your raw data screenshot before, if you change row 3, field 3 to "bike" for instance, and then try to get it to eliminate records based on field 2 only. That is, have only one kind of field 2, that would be the best approximation of what I'm looking at.

The problem is that the rows aren't *exact* duplicates. In reality, they differ only very slightly, but enough so that the advanced filter doesn't work. There is only one column (the unique identifier one) that will tell me if the row needs removing.

Report
 
 
 
September 02, 2009 01:13 AM
"There is only one column (the unique identifier one) that will tell me if the row needs removing. "

In that case, I believe autofilter can be used and then delete the rows the same unique identifier like here:
http://farm3.static.flickr.com/2588/3879264147_52bcc88339_m.jpg

http://farm3.static.flickr.com/2674/3879272719_eee7342040_m.jpg

Report
 
 
 
September 02, 2009 01:29 AM
We're almost there, I think. The problem is that there are ~20k rows, so just getting all the ones that are duplicates doesn't work.

What I need is a way to flag all the duplicate rows - namely the time "car" appears the second (and third and fourth) times.

Report
 
 
 
September 02, 2009 02:44 AM
Check if any of these work for you:
Excel Worksheet Functions

I would still like to see a 10 row sample data from you to use a suitable function to filter.

Report
 
 
 
September 02, 2009 05:36 PM
Perfect - there's a good answer from there. The countif function, with the starter row unlocked will do the trick.

Thanks!

Report
 
 

Other Answers (1)

Sort By
 
September 01, 2009 11:07 PM
Could you just do a data sort (select all the columns with data first), then sort by the column with the unique identifyier? Once sorted, you could highlight all the duplicates (hold down the control key and click on the gray portion to the left of Column A to select the full row, and multiple rows), then right-click and delete entire row(s)?

That's the only way I've ever known how to do it - it may be cumbersome if you have a very large amount of data but if you only have a hundred rows or so it should be manageable.

Best of luck!

Helpful Answer?  (0)   (0)    Tip iheartjuppy for this answer
Permalink | Report
   Reply  
 
 
 
September 02, 2009 12:05 AM
I wish I could do it this way, but there are ~20k entries. Don't think it will quite work like that - thanks though!

Report
 
 

Answer this Question

How tips and payments work

This question has already been resolved. You may add an answer to it but you will not be eligible to win best answer or any associated tips.

Ask a Question


140 characters left
Top of Page
Buy Mahalo Dollars with Credit Card or PayPal

Top Members

This Week All Time
  • buddawiggi
    buddawiggi
    2nd Degree Black Belt
    26710 Points
    M$774.34 Earned
  • kty2777
    kty2777
    Purple Belt with a Brown Tip
    5216 Points
    M$196.67 Earned
  • opher
    opher
    Purple Belt
    3852 Points
    M$152.92 Earned
   See All
 

Most Popular Tags

mahalo(1564)
iphone(459)
music(458)
google(346)
food(310)
online(287)
beer(277)
money(260)
movies(249)
apple(249)
aotd(235)
health(217)
video(200)
free(200)
dog(200)
   See All
 

Categories

Welcome New Members


 
 
Mahalo Dollars are the currency of Mahalo Answers.

Each Mahalo Dollar costs $1.

Once you earn more than 40 Mahalo Dollars, you can request to be paid via PayPal. Each Mahalo Dollar is currently worth $0.75 when paid out via PayPal. Learn More

 
 

Please log in to use this function.