How To Clean Date Data In Excel
08-01-2017,05:51 PM #1
Registered User
How to clean up and format very messy date data (mix of text and date)
Hi everyone,There are LOTS of posts regarding Excel date formatting (172 pages or so) but tons of skimming, I didn't see my situation--so apologies in advance if this is a duplicate thread.
I'm helping a nonprofit organization with their records and came across a VERY messy dataset for their donations. I can only imagine how many volunteers entered the data using their own data entry preferences.
This data has nearly 2400 rows, where each row is a donation and should have a date. While nearly all of the rows have a date, the formats vary widely. My goal is to get all the dates into some sort of common format and I can later use for sorting and in other functions. No luck after many hours of fiddling with it.
Any help is immensely appreciated! I've extracted the dates into a sample file and attached it here.I have 4 categories of dates to get into a clean, common format.
1. Most of the dates were entered as MM-DD-YY (or YYYY), although Excel is reading them as DD-MM-YYYY. That isn't so bad to fix, but is complicated with the other date data entries.
2. Some date info looks numeric (MM-DD-YY), but is text. Again, not so bad to fix, but is troublesome given the other type of date data in the column.
3. Also, in many rows, the date info contains inconsistently formatted alphanumeric data that I haven't been able to nicely identify and convert.
4. Lastly, there are a few random rows that are not date information at all or has non-date text mixed with date info--the non-date text doesn't belong in the date column and needs its own column.Below is an example of the messy dates with Cat. matching the 4 categories above and testing to see which rows are text v numeric data:
Cat. IsText Date
1 FALSE 04-10-06
2 TRUE 09-16-2004
1 FALSE 03-09-05
1 FALSE 08-03-06
3 TRUE Nov. �02
3 TRUE Jan/Feb 2011
2 TRUE 2/16/03
2 TRUE 2/22/04
1 FALSE 01-02-05
4 TRUE 8/12/04 (scholarship)
4 TRUE PaidI tried:
Cell formatting features, but it didn't work for me due to the different text & numeric format types in each cell.
Text-to-Columns to help separate and format each piece, but again the mixed formats caused trouble.
Many text conversion formulas found on this and other forums, but most just produced #Value results.Unfortunately, I don't know VBA, but I might be able to follow a solution that uses it.
Any thoughts on how this can be cleaned up? It would be great to have the dates in (any format) one column and any 'extra' non-date text in a separate column.
For rows that only show a year (or month & year), I don't mind using a 01 proxy for the missing day (or month & day).
For any instances that have more than one month listed, it doesn't matter to me which one is kept and the other can be dropped.
If you find a solution in the sample file, please keep the Count column so I can match the data to my master file.Again, any thoughts are greatly appreciated! Thank you for the time and consideration.
08-01-2017,06:37 PM #2
Re: How to clean up and format very messy date data (mix of text and date)
Enter formula in C2 and copy down
If you like my answer please click on * Add Reputation
Don't forget to mark threads as "Solved" if your problem has been resolved"Nothing is so firmly believed as what we least know."
--Michel de Montaigne
08-01-2017,07:43 PM #3
Re: How to clean up and format very messy date data (mix of text and date)
This is much longer. Attempts to remove additional characters.Filter on this column for 1905, 1900, etc. and do those manually. I found only 80.
08-04-2017,11:24 AM #4
Registered User
Re: How to clean up and format very messy date data (mix of text and date)
Thank you very much to AlKey and FlameRetired for the assist. However, I my results from both solutions didn't quite match want I need. Perhaps I miscommunicated.In the example here, I only use the written month to illustrate the difference in the needed and previous solution dates.
AlKey had a great first pass, while FlameRetired picked up many of the dates that were entered as text and extracting the text date that had other non-date text in the cell. However, FlameRetired's formula flipped the months & days for numeric dates, while working well for text dates. Both formulas either passed over or dropped non-date text.Quite the head scratcher, but I REALLY appreciate the help. Those are some clever formulas!
Last edited by RUJedi; 08-04-2017 at 12:02 PM. Reason: Put example feedback into a table for easier viewing.
08-04-2017,07:27 PM #5
Re: How to clean up and format very messy date data (mix of text and date)
RUJedi thank you for the detailed feedback. I/we really appreciate when the OP does that.I couldn't get the days/months to flip at my end. Those were all correct
I was able to take care of most of the year only "dates". With this formula there are only 3 items left to clean up manually.
I was not aware that you wanted formula to extract the "notes" part. Unfortunately this formula only returns a portion of the extra notes. Many of the strings are only partial. After applying this formula filtering and manually correcting those might be your only option.
08-05-2017,07:44 PM #6
Registered User
Re: How to clean up and format very messy date data (mix of text and date)
Thanks again FlameRetired.Looks like we're getting there, but I found a few bugs in your recent and previous formulas.
The first one did something weird that I didn't catch on your initial reply. When a date was initially entered as numeric, your formula produced the right date. However, when it was entered as text, your formula treated the day as the year, dropped the actual year, and added a 1 for the day which was not there previously. Example below. I tinkered to change dash (-) and slash (/) separators in the source cell, but it didn't seem to make any difference in the result.
Help Help type FR's forumula Reformatted What it should be Comments 07-09-01 number 37141 07-Sep-01 07-Sep-01 This works 03/30/05 text 11018 1-Mar-30 30-Mar-05 Used the day as the year, dropped the actual year, used 1 as the day which was not in the original date
Your second formula did as great job extracting the non-date text from the date. I saw what you meant about sometime extracting only part of the text. There are not many cells that have both date and non-date text, so I should be able to manually correct them. Thanks! I might also have to manually correct a few instances of incorrect dates it produced, but that isn't a big deal. For reference of incorrect dates, it did this:
Help FR's forumula Reformatted What it should be 2nd column Comments Deposited in Bratislava in 2001 2001 23-Jun-05 01-01-2001 in Bratislava in 2001 Looks like the formula found the year as text and put that as the unformatted date, which then made the wrong date when formatted. I can fix the 2nd column.
I'll keep looking up if there are ways to recognize and fix the month & day flips. Thanks again.
Last edited by RUJedi; 08-05-2017 at 07:46 PM. Reason: added column in table so show if text or number.
08-06-2017,12:21 AM #7
Re: How to clean up and format very messy date data (mix of text and date)
RUJedi I am getting the right date on 03/30/05 at my end. See the attachment.I am not sure how to resolve special exceptions like Deposited in Bratislava in 2001 in formula. I've reached the point where solution of each exception defeats solutions already solved.
Have a look at the attached.
08-07-2017,05:07 PM #8
Registered User
Re: How to clean up and format very messy date data (mix of text and date)
Thanks again FlameRetired. Your help is much appreciated.
Did you get the right date for 3/30/05? I downloaded your file and the original data of 3/30/05 (March 30th 2005) appears on row 18, consistent with my sample file that I applied your formula. In your file, C18 shows 01-03-30 (March 1st 2030). Where else did you see the 03/30/05 converted properly?Curious--I notice your use of ROW($1:$30) nested within the larger date converting formula, although the formula was copied and operates down all 2373 rows. What was the purpose of the Row 1 through 30 range, but more specifically--why did you stop it at 30 rather than row 2373? At a first glance, using ROW$1$30 produces the same output as ROW$1$2373.
No worries on extra solutions for the special exceptions. There are few enough that I should be able to fix them manually. Your formula to extract them in a separate column was the real heavy lifting.
Thanks again!
08-07-2017,07:02 PM #9
Re: How to clean up and format very messy date data (mix of text and date)
In Count numbers 1345 and 1993.Where else did you see the 03/30/05 converted properly?
I don't find 01-03-30 anywhere in the 'messy' dates column and there are no dates in column C in the year 2030. They only go as high as 11/2/2017 (Count number 5). Although I just noticed in Count number 497 that Nov. 02 failed to convert properly.C18 shows 01-03-30 (March 1st 2030)
The ROW($1:$30) doesn't apply to spreadsheet rows. It applies to the num_chars argumentCurious--I notice your use of ROW($1:$30) nested within the larger date converting formula, although the formula was copied and operates down all 2373 rows. What was the purpose of the Row 1 through 30 range, but more specifically--why did you stop it at 30 rather than row 2373?
LEFT(text, [ num_chars ])
ROW($1:$30) returns multiple increasingly longer strings starting at one character until exhausting the length of the source text. I chose 30 because it was sufficiently large enough to exhaust the length of the text. Its value from there is arbitrary.
I you type in a temporary cell E19 (Count 18) this formula you will see what I mean.
=LEFT(B19,ROW($1:$30))
Then click in the formula bar and press the F9 function key you will see ...
{"3";"3/";"3/3";"3/30";"3/30/";"3/30/0";"3/30/05";"3/30/05";"3/30/05";"3/30/05";"3/30/05";"3/30/05";"3/30/05";"3/30/05";"3/30/05";"3/30/05";"3/30/05";"3/30/05";"3/30/05";"3/30/05";"3/30/05";"3/30/05";"3/30/05";"3/30/05";"3/30/05";"3/30/05";"3/30/05";"3/30/05";"3/30/05";"3/30/05"}Once converted to numbers
{3;#VALUE!;42797;42824;#VALUE!;36615;38441;38441;38441;38441;38441;38441;38441;38441;38441;38441;38441;38441;38441;38441;38441;38441;38441;38441;38441;38441;38441;38441;38441;38441}
LOOKUP ignores errors and returns the right most number in that array that is less than 1E306 (a sufficiently large number guaranteed to be larger than anything in the array).
Hope that answers.
Glad to hear these have worked out, and thank you for the feedback.
08-09-2017,12:09 AM #10
Registered User
Re: How to clean up and format very messy date data (mix of text and date)
Very clever use of the ROW$1:$30. Thanks for the detailed explanation.
08-09-2017,02:17 AM #11
Re: How to clean up and format very messy date data (mix of text and date)
You're welcome. Thanks for the feedback.
How To Clean Date Data In Excel
Source: https://www.excelforum.com/excel-general/1195227-how-to-clean-up-and-format-very-messy-date-data-mix-of-text-and-date.html
Posted by: groveloung1959.blogspot.com
0 Response to "How To Clean Date Data In Excel"
Post a Comment