Cleaning up text pasted from the Web

The ease of copying and pasting text from Web sites and email greatly simplifies many tasks in Word, but problems often arise in making the pasted text conform to the style of the document into which it is pasted. One of the most common chores is getting rid of excess line breaks, which cause the text to wrap short of the right margin. There are several ways to work around this problem.

Assessing the problem text

The most efficient method of reformatting short lines of text depends on whether the breaks are line breaks or paragraph breaks. So the first line of attack must be to display nonprinting characters using the Show/Hide button on the Standard toolbar. (For more on nonprinting characters, see What do all those funny marks, like the dots between the words in my document, and the square bullets in the left margin, mean?) If each line ends in a pilcrow or paragraph mark (¶), then AutoFormat may be all you need. If each line ends in a bent arrow
(signifying a line break), you will need to use a different approach.

Using AutoFormat

No matter what other options you have enabled on the AutoFormat tab of Tools | AutoCorrect, when you select a block of text with a paragraph break at the end of each full line, AutoFormat will delete all the paragraph breaks but the last.

Unfortunately, text pasted from the Web or email nowadays rarely has lines ending in paragraph breaks. But you can force this format by using Paste Special and selecting “Unformatted Text” (in Word 2002, if you have “Paste Options” enabled, you can just Paste and then select the “Keep Text Only” option). This pastes your selection with paragraph breaks instead of line breaks, and AutoFormat will then do the trick.

Using Find and Replace

Sometimes, however, you will not want to paste as unformatted text. In that case, what you will most likely get is text with a line break at the end of each line. Provided there is an empty line at the end of each paragraph, cleanup is still relatively simple. It takes just two Find and Replace operations.

First pass

1.

Press Ctrl+H to open the Find and Replace dialog.

2.

In the “Find what” box, type ^l^l (those are lowercase Ls, representing two line breaks).

3.

In the “Replace with” box, type ^p (the code for a paragraph break).

4.

Replace All. You will now have a paragraph break at the end of each true paragraph.

Second pass

1.

In the “Find what” box, type ^l.

2.

In the “Replace with” box, type a space.

3.

Replace All.

This removes the line breaks and allows text to wrap naturally.

Harder cases
If there is not an empty line between paragraphs, you will probably have to insert paragraph breaks by hand. If the amount of text is not large, you can scroll through and press Enter where a paragraph break is needed. Then replace each line break with a space. This will leave an extra space at either the beginning or the end of each paragraph. You can use Find and Replace again to replace <space>^p or ^p<space> (as appropriate) with ^p. (Note that “<space>” represents pressing the spacebar, you don’t type “<space>”)!

An alternative approach is to Shift+Enter to enter an extra line break at the end of each paragraph, then follow the instructions in the section above.

Even when the amount of text is very large, there is no really good alternative to manual editing. But if you Paste Special as Unformatted Text and run AutoFormat, you may find that Word is almost as clever as you are at finding where a paragraph ends.

Note that the methods described above are suitable only for simple text. If you have copied and pasted an entire Web page, with graphics, tables, and frames, much more work will be required to format it for use in a Word document.

Other non-printing characters worth replacing

Often when you paste from the web, and also from some other applications, characters come in which display as paragraph marks but don’t behave like “proper” paragraph breaks should – they behave like manual line breaks!. So you might find that when you center a “paragraph”, several other adjacent paragraphs also get centered. To cure this, do a Find and Replace; in the “Find what” box type ^013, in the “Replace with” box, type ^p and click “Replace All”.

Whenpastingfromtheweb,nonbreakingspacesoftencomein,ratherthanordinary spaces. To get rid of them, do a Find and Replace; in the “Find what” box type ^s, in the “Replace with” box, insert a <space> character (press the spacebar), and click “Replace All”.

If you want to automate any of the above steps you can record them using the macro recorder and play them back as needed.

Neat tip

This following tip has appeared in Woody’s Office Watch (WOW). When you cut and paste text from a Web site, there are often leading spaces at the beginning of each line. A very quick way to remove all these spaces is to select the text, center justify the selection and then left justify the selection. All the extra spaces will have disappeared.

sumber