If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
Find and Replace anomaly
I have used Find and Replace many times to replace extra paragraph marks and
paragraph marks that occur at the end of every line (typical of things copied from web pages). Now I am in a situation where a government representative (we are in a regulated industry) wants to see where we keep a particular regulation for reference. The twist is that the regulation is not available from the government in printable form, yet the web site's copy is not considered adequate for our records. That leaves me to copy from the web site and attempt to make it into a document. I have done this before, but this one is different. I have replaced all styles with Normal (for now; I will apply custom styles later), used a macro to remove all hyperlinks, used Find and Replace to remove all graphics. Here's the problem: I cannot use Find and Replace to replace a succession of paragraph marks with a single paragraph mark. I can do it in any other document, but not in this one. If I copy a succession of two paragraphs from the document to a new document I get the same result (it doesn't identify the successive paragraphs as being two paragraphs), but if I add paragraphs to the new document with the Enter key Find and Replace works as it should. Similarly, when I add empty paragraphs to the troublesome document I can find them as I would expect. I have tried a wildcard search (Find ^13{2,}, Replace With ^p), and without wildcards (Find ^p^p, Replace With ^p). No luck. If I search for a single paragraph I can find every one, including both in the pair. If I replace every paragraph mark with, say, a £, then attempt to replace every instance of ££ with £, same problem as with the paragraphs: it does not recognize it as a pair. There is nothing such as a space between the paragraphs. I have removed all manual formatting, hyperlinks, graphics, etc. In short, everything in the document is part of the ASCII extended character set. I replaced ^13 with ^p, and ^p with ^p (with and without wildcards respectively). I copied the entire document to Notepad, then opened that with Word. In every case, same result. Anybody have an idea as to what is going on here? |
#2
|
|||
|
|||
Hi Bruce,
The characters could be manual line breaks instead of paragraph marks. The Find code is ^l (a lower case ell). Turn on nonprinting character display and check the line ends. A manual break looks like a left-pointing arrow with a hooked tail. See http://word.mvps.org/FAQs/Formatting/CleanWebText.htm for more help. -- Regards, Jay Freedman Microsoft Word MVP FAQ: http://word.mvps.org BruceM wrote: I have used Find and Replace many times to replace extra paragraph marks and paragraph marks that occur at the end of every line (typical of things copied from web pages). Now I am in a situation where a government representative (we are in a regulated industry) wants to see where we keep a particular regulation for reference. The twist is that the regulation is not available from the government in printable form, yet the web site's copy is not considered adequate for our records. That leaves me to copy from the web site and attempt to make it into a document. I have done this before, but this one is different. I have replaced all styles with Normal (for now; I will apply custom styles later), used a macro to remove all hyperlinks, used Find and Replace to remove all graphics. Here's the problem: I cannot use Find and Replace to replace a succession of paragraph marks with a single paragraph mark. I can do it in any other document, but not in this one. If I copy a succession of two paragraphs from the document to a new document I get the same result (it doesn't identify the successive paragraphs as being two paragraphs), but if I add paragraphs to the new document with the Enter key Find and Replace works as it should. Similarly, when I add empty paragraphs to the troublesome document I can find them as I would expect. I have tried a wildcard search (Find ^13{2,}, Replace With ^p), and without wildcards (Find ^p^p, Replace With ^p). No luck. If I search for a single paragraph I can find every one, including both in the pair. If I replace every paragraph mark with, say, a £, then attempt to replace every instance of ££ with £, same problem as with the paragraphs: it does not recognize it as a pair. There is nothing such as a space between the paragraphs. I have removed all manual formatting, hyperlinks, graphics, etc. In short, everything in the document is part of the ASCII extended character set. I replaced ^13 with ^p, and ^p with ^p (with and without wildcards respectively). I copied the entire document to Notepad, then opened that with Word. In every case, same result. Anybody have an idea as to what is going on here? |
#3
|
|||
|
|||
----- Original Message -----
From: "BruceM" Newsgroups: microsoft.public.word.docmanagement Sent: Monday, January 17, 2005 11:43 AM Subject: Find and Replace anomaly I have used Find and Replace many times to replace extra paragraph marks and paragraph marks that occur at the end of every line (typical of things copied from web pages). Now I am in a situation where a government representative (we are in a regulated industry) wants to see where we keep a particular regulation for reference. The twist is that the regulation is not available from the government in printable form, yet the web site's copy is not considered adequate for our records. That leaves me to copy from the web site and attempt to make it into a document. I have done this before, but this one is different. I have replaced all styles with Normal (for now; I will apply custom styles later), used a macro to remove all hyperlinks, used Find and Replace to remove all graphics. Here's the problem: I cannot use Find and Replace to replace a succession of paragraph marks with a single paragraph mark. I can do it in any other document, but not in this one. If I copy a succession of two paragraphs from the document to a new document I get the same result (it doesn't identify the successive paragraphs as being two paragraphs), but if I add paragraphs to the new document with the Enter key Find and Replace works as it should. Similarly, when I add empty paragraphs to the troublesome document I can find them as I would expect. I have tried a wildcard search (Find ^13{2,}, Replace With ^p), and without wildcards (Find ^p^p, Replace With ^p). No luck. If I search for a single paragraph I can find every one, including both in the pair. If I replace every paragraph mark with, say, a £, then attempt to replace every instance of ££ with £, same problem as with the paragraphs: it does not recognize it as a pair. There is nothing such as a space between the paragraphs. I have removed all manual formatting, hyperlinks, graphics, etc. In short, everything in the document is part of the ASCII extended character set. I replaced ^13 with ^p, and ^p with ^p (with and without wildcards respectively). I copied the entire document to Notepad, then opened that with Word. In every case, same result. Anybody have an idea as to what is going on here? Bruce, Likely the best service you could do in assiting yourself would be in providing the URL for the page your attempting to convert? CSS and html have formatting options which display spacing and such beyond Word's formatting options. One alternative option may be to print the web page to a PDF file retaning all formatting in the process. |
#4
|
|||
|
|||
Thanks for taking the time to reply. I guess I should have mentioned I
displayed nonprinting characters. I finally figured out what it was (sort of). At any rate I made the problem go away. Scattered throughout the document was a sort of right angle arrow pointing up (a graphic, not a line break) followed by the word "top" as a hyperlink followed by a paragraph mark. I used ^g to get rid of the graphics, then replaced "top" in hyperlink character style with "top" in Normal style, then I replaced top^p with nothing. It was the paragraph mark after the hyperlink that went weird on me. "Jay Freedman" wrote: Hi Bruce, The characters could be manual line breaks instead of paragraph marks. The Find code is ^l (a lower case ell). Turn on nonprinting character display and check the line ends. A manual break looks like a left-pointing arrow with a hooked tail. See http://word.mvps.org/FAQs/Formatting/CleanWebText.htm for more help. -- Regards, Jay Freedman Microsoft Word MVP FAQ: http://word.mvps.org BruceM wrote: I have used Find and Replace many times to replace extra paragraph marks and paragraph marks that occur at the end of every line (typical of things copied from web pages). Now I am in a situation where a government representative (we are in a regulated industry) wants to see where we keep a particular regulation for reference. The twist is that the regulation is not available from the government in printable form, yet the web site's copy is not considered adequate for our records. That leaves me to copy from the web site and attempt to make it into a document. I have done this before, but this one is different. I have replaced all styles with Normal (for now; I will apply custom styles later), used a macro to remove all hyperlinks, used Find and Replace to remove all graphics. Here's the problem: I cannot use Find and Replace to replace a succession of paragraph marks with a single paragraph mark. I can do it in any other document, but not in this one. If I copy a succession of two paragraphs from the document to a new document I get the same result (it doesn't identify the successive paragraphs as being two paragraphs), but if I add paragraphs to the new document with the Enter key Find and Replace works as it should. Similarly, when I add empty paragraphs to the troublesome document I can find them as I would expect. I have tried a wildcard search (Find ^13{2,}, Replace With ^p), and without wildcards (Find ^p^p, Replace With ^p). No luck. If I search for a single paragraph I can find every one, including both in the pair. If I replace every paragraph mark with, say, a £, then attempt to replace every instance of ££ with £, same problem as with the paragraphs: it does not recognize it as a pair. There is nothing such as a space between the paragraphs. I have removed all manual formatting, hyperlinks, graphics, etc. In short, everything in the document is part of the ASCII extended character set. I replaced ^13 with ^p, and ^p with ^p (with and without wildcards respectively). I copied the entire document to Notepad, then opened that with Word. In every case, same result. Anybody have an idea as to what is going on here? |
#5
|
|||
|
|||
Could it have been a text-wrapping break? They look a lot like line breaks,
and AFAIK there is no way to search for them using Find. -- Suzanne S. Barnhill Microsoft MVP (Word) Words into Type Fairhope, Alabama USA Word MVP FAQ site: http://word.mvps.org Email cannot be acknowledged; please post all follow-ups to the newsgroup so all may benefit. "BruceM" wrote in message ... Thanks for taking the time to reply. I guess I should have mentioned I displayed nonprinting characters. I finally figured out what it was (sort of). At any rate I made the problem go away. Scattered throughout the document was a sort of right angle arrow pointing up (a graphic, not a line break) followed by the word "top" as a hyperlink followed by a paragraph mark. I used ^g to get rid of the graphics, then replaced "top" in hyperlink character style with "top" in Normal style, then I replaced top^p with nothing. It was the paragraph mark after the hyperlink that went weird on me. "Jay Freedman" wrote: Hi Bruce, The characters could be manual line breaks instead of paragraph marks. The Find code is ^l (a lower case ell). Turn on nonprinting character display and check the line ends. A manual break looks like a left-pointing arrow with a hooked tail. See http://word.mvps.org/FAQs/Formatting/CleanWebText.htm for more help. -- Regards, Jay Freedman Microsoft Word MVP FAQ: http://word.mvps.org BruceM wrote: I have used Find and Replace many times to replace extra paragraph marks and paragraph marks that occur at the end of every line (typical of things copied from web pages). Now I am in a situation where a government representative (we are in a regulated industry) wants to see where we keep a particular regulation for reference. The twist is that the regulation is not available from the government in printable form, yet the web site's copy is not considered adequate for our records. That leaves me to copy from the web site and attempt to make it into a document. I have done this before, but this one is different. I have replaced all styles with Normal (for now; I will apply custom styles later), used a macro to remove all hyperlinks, used Find and Replace to remove all graphics. Here's the problem: I cannot use Find and Replace to replace a succession of paragraph marks with a single paragraph mark. I can do it in any other document, but not in this one. If I copy a succession of two paragraphs from the document to a new document I get the same result (it doesn't identify the successive paragraphs as being two paragraphs), but if I add paragraphs to the new document with the Enter key Find and Replace works as it should. Similarly, when I add empty paragraphs to the troublesome document I can find them as I would expect. I have tried a wildcard search (Find ^13{2,}, Replace With ^p), and without wildcards (Find ^p^p, Replace With ^p). No luck. If I search for a single paragraph I can find every one, including both in the pair. If I replace every paragraph mark with, say, a £, then attempt to replace every instance of ££ with £, same problem as with the paragraphs: it does not recognize it as a pair. There is nothing such as a space between the paragraphs. I have removed all manual formatting, hyperlinks, graphics, etc. In short, everything in the document is part of the ASCII extended character set. I replaced ^13 with ^p, and ^p with ^p (with and without wildcards respectively). I copied the entire document to Notepad, then opened that with Word. In every case, same result. Anybody have an idea as to what is going on here? |
#6
|
|||
|
|||
No, not text-wrapping. What was so strange was that no matter what character
I substituted for the paragraphs marks, Find could not locate the one that was originally on the line with the hyperlink. "Suzanne S. Barnhill" wrote: Could it have been a text-wrapping break? They look a lot like line breaks, and AFAIK there is no way to search for them using Find. -- Suzanne S. Barnhill Microsoft MVP (Word) Words into Type Fairhope, Alabama USA Word MVP FAQ site: http://word.mvps.org Email cannot be acknowledged; please post all follow-ups to the newsgroup so all may benefit. "BruceM" wrote in message ... Thanks for taking the time to reply. I guess I should have mentioned I displayed nonprinting characters. I finally figured out what it was (sort of). At any rate I made the problem go away. Scattered throughout the document was a sort of right angle arrow pointing up (a graphic, not a line break) followed by the word "top" as a hyperlink followed by a paragraph mark. I used ^g to get rid of the graphics, then replaced "top" in hyperlink character style with "top" in Normal style, then I replaced top^p with nothing. It was the paragraph mark after the hyperlink that went weird on me. "Jay Freedman" wrote: Hi Bruce, The characters could be manual line breaks instead of paragraph marks. The Find code is ^l (a lower case ell). Turn on nonprinting character display and check the line ends. A manual break looks like a left-pointing arrow with a hooked tail. See http://word.mvps.org/FAQs/Formatting/CleanWebText.htm for more help. -- Regards, Jay Freedman Microsoft Word MVP FAQ: http://word.mvps.org BruceM wrote: I have used Find and Replace many times to replace extra paragraph marks and paragraph marks that occur at the end of every line (typical of things copied from web pages). Now I am in a situation where a government representative (we are in a regulated industry) wants to see where we keep a particular regulation for reference. The twist is that the regulation is not available from the government in printable form, yet the web site's copy is not considered adequate for our records. That leaves me to copy from the web site and attempt to make it into a document. I have done this before, but this one is different. I have replaced all styles with Normal (for now; I will apply custom styles later), used a macro to remove all hyperlinks, used Find and Replace to remove all graphics. Here's the problem: I cannot use Find and Replace to replace a succession of paragraph marks with a single paragraph mark. I can do it in any other document, but not in this one. If I copy a succession of two paragraphs from the document to a new document I get the same result (it doesn't identify the successive paragraphs as being two paragraphs), but if I add paragraphs to the new document with the Enter key Find and Replace works as it should. Similarly, when I add empty paragraphs to the troublesome document I can find them as I would expect. I have tried a wildcard search (Find ^13{2,}, Replace With ^p), and without wildcards (Find ^p^p, Replace With ^p). No luck. If I search for a single paragraph I can find every one, including both in the pair. If I replace every paragraph mark with, say, a £, then attempt to replace every instance of ££ with £, same problem as with the paragraphs: it does not recognize it as a pair. There is nothing such as a space between the paragraphs. I have removed all manual formatting, hyperlinks, graphics, etc. In short, everything in the document is part of the ASCII extended character set. I replaced ^13 with ^p, and ^p with ^p (with and without wildcards respectively). I copied the entire document to Notepad, then opened that with Word. In every case, same result. Anybody have an idea as to what is going on here? |
#7
|
|||
|
|||
One way to get hold of these unidentifiable characters is to get an
exhaustive list of what they are not, together with the the exclamation mark, such as [!A-Za-z0-9]. It is not difficult to create an exhaustive list. In a duplicate document, get rid of characters until almost nothing remains. Start off with, say, [A-Za-z0-9], change that to nothing and add characters that remain. Once you have the list, add the exclamation mark and use that in the original document.. "Suzanne S. Barnhill" wrote: Could it have been a text-wrapping break? They look a lot like line breaks, and AFAIK there is no way to search for them using Find. -- Suzanne S. Barnhill Microsoft MVP (Word) Words into Type Fairhope, Alabama USA Word MVP FAQ site: http://word.mvps.org Email cannot be acknowledged; please post all follow-ups to the newsgroup so all may benefit. "BruceM" wrote in message ... Thanks for taking the time to reply. I guess I should have mentioned I displayed nonprinting characters. I finally figured out what it was (sort of). At any rate I made the problem go away. Scattered throughout the document was a sort of right angle arrow pointing up (a graphic, not a line break) followed by the word "top" as a hyperlink followed by a paragraph mark. I used ^g to get rid of the graphics, then replaced "top" in hyperlink character style with "top" in Normal style, then I replaced top^p with nothing. It was the paragraph mark after the hyperlink that went weird on me. "Jay Freedman" wrote: Hi Bruce, The characters could be manual line breaks instead of paragraph marks. The Find code is ^l (a lower case ell). Turn on nonprinting character display and check the line ends. A manual break looks like a left-pointing arrow with a hooked tail. See http://word.mvps.org/FAQs/Formatting/CleanWebText.htm for more help. -- Regards, Jay Freedman Microsoft Word MVP FAQ: http://word.mvps.org BruceM wrote: I have used Find and Replace many times to replace extra paragraph marks and paragraph marks that occur at the end of every line (typical of things copied from web pages). Now I am in a situation where a government representative (we are in a regulated industry) wants to see where we keep a particular regulation for reference. The twist is that the regulation is not available from the government in printable form, yet the web site's copy is not considered adequate for our records. That leaves me to copy from the web site and attempt to make it into a document. I have done this before, but this one is different. I have replaced all styles with Normal (for now; I will apply custom styles later), used a macro to remove all hyperlinks, used Find and Replace to remove all graphics. Here's the problem: I cannot use Find and Replace to replace a succession of paragraph marks with a single paragraph mark. I can do it in any other document, but not in this one. If I copy a succession of two paragraphs from the document to a new document I get the same result (it doesn't identify the successive paragraphs as being two paragraphs), but if I add paragraphs to the new document with the Enter key Find and Replace works as it should. Similarly, when I add empty paragraphs to the troublesome document I can find them as I would expect. I have tried a wildcard search (Find ^13{2,}, Replace With ^p), and without wildcards (Find ^p^p, Replace With ^p). No luck. If I search for a single paragraph I can find every one, including both in the pair. If I replace every paragraph mark with, say, a £, then attempt to replace every instance of ££ with £, same problem as with the paragraphs: it does not recognize it as a pair. There is nothing such as a space between the paragraphs. I have removed all manual formatting, hyperlinks, graphics, etc. In short, everything in the document is part of the ASCII extended character set. I replaced ^13 with ^p, and ^p with ^p (with and without wildcards respectively). I copied the entire document to Notepad, then opened that with Word. In every case, same result. Anybody have an idea as to what is going on here? |
#8
|
|||
|
|||
Without knowing what is in the document, here is my shot in the dark.
Paragraph marks are usually after periods, exclamation marks and question marks. Change ? to £ and ! to ¥. Then, with wildcards enabled, replace ([.£¥])^13 with \1§. Now remove the remaing paragraph marks by replacing ^13 with nothing. Replace £ with ?, ¥ with ! and § with paragraph mark. "BruceM" wrote: I have used Find and Replace many times to replace extra paragraph marks and paragraph marks that occur at the end of every line (typical of things copied from web pages). Now I am in a situation where a government representative (we are in a regulated industry) wants to see where we keep a particular regulation for reference. The twist is that the regulation is not available from the government in printable form, yet the web site's copy is not considered adequate for our records. That leaves me to copy from the web site and attempt to make it into a document. I have done this before, but this one is different. I have replaced all styles with Normal (for now; I will apply custom styles later), used a macro to remove all hyperlinks, used Find and Replace to remove all graphics. Here's the problem: I cannot use Find and Replace to replace a succession of paragraph marks with a single paragraph mark. I can do it in any other document, but not in this one. If I copy a succession of two paragraphs from the document to a new document I get the same result (it doesn't identify the successive paragraphs as being two paragraphs), but if I add paragraphs to the new document with the Enter key Find and Replace works as it should. Similarly, when I add empty paragraphs to the troublesome document I can find them as I would expect. I have tried a wildcard search (Find ^13{2,}, Replace With ^p), and without wildcards (Find ^p^p, Replace With ^p). No luck. If I search for a single paragraph I can find every one, including both in the pair. If I replace every paragraph mark with, say, a £, then attempt to replace every instance of ££ with £, same problem as with the paragraphs: it does not recognize it as a pair. There is nothing such as a space between the paragraphs. I have removed all manual formatting, hyperlinks, graphics, etc. In short, everything in the document is part of the ASCII extended character set. I replaced ^13 with ^p, and ^p with ^p (with and without wildcards respectively). I copied the entire document to Notepad, then opened that with Word. In every case, same result. Anybody have an idea as to what is going on here? |
Thread Tools | |
Display Modes | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Find and Replace Creates New Instance | GeneQS | General Discussion | 2 | October 4th, 2004 03:27 PM |
Find and Replace with Wildcards | Luke Muehlhauser | General Discussion | 2 | August 8th, 2004 02:55 AM |
Find and Replace in Word 97 | Doug Robbins - Word MVP | General Discussion | 1 | May 28th, 2004 02:41 PM |
find all / replace all used in a selected range | oldguy | Worksheet Functions | 0 | February 26th, 2004 02:05 PM |
find all / replace all used in a selected range | Tom Turtle | Worksheet Functions | 0 | February 26th, 2004 11:11 AM |