/usr/share/bibledit-gtk/site/gtk/reference/menu/menu-preferences/filters/sed.html is in bibledit-gtk-data 4.9-1.
This file is owned by root:root, with mode 0o644.
The actual contents of the file can be viewed below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 | <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<link href="../../../../../bibledit.css" rel="stylesheet" type="text/css" /><!--
Copyright (©) 2003-2011 Teus Benschop and Contributors to the Wiki.
Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.3
or any later version published by the Free Software Foundation;
with no Invariant Sections, no Front-Cover Texts, and no Back-Cover
Texts. A copy of the license is included in the section entitled "GNU
Free Documentation License" in the file FDL.
-->
<title></title>
</head>
<body>
<div id="menu">
<ul>
<li>
<a href="../../../../../home.html">1 Bibledit</a>
</li>
<li>
<a href="../filters.html">Filters</a>
</li>
<li style="list-style: none; display: inline">
<hr />
</li>
<li>
<a href="tec.html">TECkit</a>
</li>
<li>
<a href="regex.html">regex</a>
</li>
<li>sed
</li>
</ul>
</div>
<div id="content">
<h1>
sed
</h1>
<p>
This reference was assembled while closely looking at the information provided in <a href="http://www.grymoire.com/Unix/Sed.html" rel="nofollow">Sed - An Introduction and Tutorial</a>. In March 2008, the author, Bruce Barnett, gave permission to use that information for the reference.
</p>
<h3>
<a name="theessentialcommandsforsubstitution" href="" id="theessentialcommandsforsubstitution"></a>The essential command: s for substitution
</h3>
<p>
The substitute command is the s. It changes all occurrences of a regular expression into another value. A simple example changes "day" to "night":
</p>
<pre>
s/day/night/
</pre>
<p>
This substitute command consists of four parts.<br />
</p>
<pre>
s substitute command<br />/../../ delimiters<br />day regular expression pattern string<br />night replacement string
</pre>
<p>
The <a href="https://sites.google.com/site/bibledit/gtk/reference/menu/menu-preferences/filters/regex">regular expressions</a> are covered in detail in another reference.
</p>
<h3>
<a name="theslashasadelimiter" href="" id="theslashasadelimiter"></a>The slash as a delimiter
</h3>
<p>
The character after the <i>s</i> is the delimiter. Conventionally it is a slash, but it can be anything that you want.
</p>
<p>
For example, it can be an underscore instead of the slash:
</p>
<pre>
s_day_night_
</pre>
<p>
Or colons:
</p>
<pre>
s:day:night:
</pre>
<p>
Or you can use the "|" character.
</p>
<pre>
s|day|night|
</pre>
<p>
Select the one that you want. As long as it is not in the string that you are looking for, anything works. Also remember that three delimiters are needed. If you get a "Unterminated 's' command" it is because you are missing one of them.
</p>
<h3>
<a name="usingampasthematchedstring" href="" id="usingampasthematchedstring"></a>Using & as the matched string
</h3>
<p>
At times one may want to search for a pattern and add some characters, like parenthesis, around or near the pattern that was found. It is easy to do this if looking for a particular string:
</p>
<pre>
s/abc/(abc)/
</pre>
<p>
This won't work if you don't know exactly what you will find. How can you put the string you found in the replacement string if you don't know what it is?
</p>
<p>
The solution requires the special character "&." It corresponds to the pattern found.
</p>
<pre>
s/[a-z]*/(&)/
</pre>
<p>
There can be any number of "&" in the replacement string. You could also double a pattern, e.g. the first number of a line:
</p>
<pre>
s/[0-9]*/& &/
</pre>
<p>
If the input for this rule is "123 abc", then the output will be "123 123 abc".
</p>
<p>
Let's slightly amend this example. Sed will match the first string, and make it as greedy as possible. The first match for '[0-9]*' is the first character on the line, as this matches zero of more numbers. So if the input was "abc 123" the output would be unchanged, except for a space before the letters. A better way to duplicate the number is to make sure it matches a number:
</p>
<pre>
s/[0-9][0-9]*/& &/
</pre>
<p>
The input of "123 abc" becomes "123 123 abc".
</p>
<h3>
<a name="using1tokeeppartofthepattern" href="" id="using1tokeeppartofthepattern"></a>Using \1 to keep part of the pattern
</h3>
<p>
The use of "(" ")" and "1" has been described in the reference on <a href="https://sites.google.com/site/bibledit/gtk/reference/menu/menu-preferences/filters/regex">regular expressions.</a> To review, the escaped parenthesis (that is, parenthesis with backslashes before them) remember portions of the regular expression. The "\1" is the first remembered pattern, and the "\2" is the second remembered pattern. If you wanted to keep the first word of a line, and delete the rest of the line, mark the important part with the parenthesis:
</p>
<pre>
s/\([a-z]*\)*/\1/
</pre>
<p>
If you want to switch two words around, you can remember two patterns and change the order around:
</p>
<pre>
s/\([a-z]*\) \([a-z]*\)/\2 \1/<br />
</pre>
<p>
The "\1" doesn't have to be in the replacement string. It can be in the pattern you are searching for. If you want to eliminate duplicated words, you can try:
</p>
<pre>
s/\([a-z]*\)\1/\1/<br />
</pre>
<p>
You can have up to nine values: "\1" thru "\9."
</p>
<h3>
<a name="substituteflags" href="" id="substituteflags"></a>Substitute Flags
</h3>
<p>
You can add additional flags after the last delimiter. These flags can specify what happens when there is more than one occurrence of a pattern on a single line, and what to do if a substitution is found.
</p>
<h3>
<a name="gglobalreplacement" href="" id="gglobalreplacement"></a>/g - Global replacement
</h3>
<p>
Most Unix utilties work on files, reading a line at a time. <i>Sed</i>, by default, is the same way. If you tell it to change a word, it will only change the first occurrence of the word on a line. You may want to make the change on every word on the line instead of the first. For an example, let's place parentheses around words on a line. Instead of using a pattern like "[A-Za-z]*" which won't match words like "won't," we will use a pattern, "[^ ]*," that matches everything except a space. Well, this will also match anything because "*" means <b>zero or more</b>. To work around a possible bug in sed, you must avoid matching the null string when using the "g" flag to <i>sed</i>. A work-around example is: "[^ ][^ ]*." The following will put parenthesis around the first word:
</p>
<pre>
s/[^ ]*/(&)/<br />
</pre>
<p>
If you want it to make changes for every word, add a "g" after the last delimiter and use the work-around:
</p>
<pre>
s/[^ ][^ ]*/(&)/g<br />
</pre>
<h3>
<a name="issedrecursive" href="" id="issedrecursive"></a>Is sed recursive?
</h3>
<p>
<i>Sed</i> only operates on patterns found in the incoming data. That is, the input line is read, and when a pattern is matched, the modified output is generated, and the <b>rest</b> of the input line is scanned. The "s" command will not scan the newly created output. That is, you don't have to worry about expressions like:
</p>
<pre>
s/loop/loop the loop/g<br />
</pre>
<p>
This will not cause an infinite loop. If a second "s" command is executed, it could modify the results of a previous command. I will show you how to execute multiple commands later.
</p>
<h3>
<a name="12etcspecifyingwhichoccurrence" href=""></a>/1, /2, etc. Specifying which occurrence
</h3>
<p>
With no flags, the first pattern is changed. With the "g" option, all patterns are changed. If you want to modify a particular pattern that is not the first one on the line, you could use "\(" and "\)" to mark each pattern, and use "\1" to put the first pattern back unchanged. This next example keeps the first word on the line but deletes the second:
</p>
<pre>
s/\([a-zA-Z]*\) \([a-zA-Z]*\) /\1 /<br />
</pre>
<p>
This rule is ugly! There is an easier way to do this. You can add a number after the substitution command to indicate you only want to match that particular pattern. Example:
</p>
<pre>
s/[a-zA-Z]* //2<br />
</pre>
<p>
You can combine a number with the g (global) flag. For instance, if you want to leave the first world alone alone, but change the second, third, etc. to DELETED, use /2g:
</p>
<pre>
s/[a-zA-Z]* /DELETED /2g<br />
</pre>
<p>
Don't get /2 and \2 confused. The /2 is used at the end. \2 is used in inside the replacement field.
</p>
<p>
Note the space after the "*" character. Without the space, <i>sed</i> will run a long, long time. (Note: this bug is probably fixed by now.) This is because the number flag and the "g" flag have the same bug. You should also be able to use the pattern
</p>
<pre>
s/[^ ]*//2<br />
</pre>
<p>
but this also eats CPU.<br />
</p>
<p>
The number flag is not restricted to a single digit. It can be any number from 1 to 512. If you wanted to add a colon after the 80th character in each line, you could type:
</p>
<pre>
s/./&:/80<br />
</pre>
<h3>
<a name="combiningsubstitutionflags" href="" id="combiningsubstitutionflags"></a>Combining substitution flags
</h3>
<p>
You can combine flags when it makes sense. Some things that don't work are using a number and a "g" flag, because this is inconsistent.
</p>
<h3>
<a name="scripts" href="" id="scripts"></a>Scripts
</h3>
<p>
If you have a large number of <i>sed</i> rules, you can put then into a script.
</p>
<p>
A <i>sedscript</i> could look like this:
</p>
<pre>
# sed comment - This script changes lower case vowels to upper case<br />s/a/A/g<br />s/e/E/g<br />s/i/I/g<br />s/o/O/g<br />s/u/U/g<br />
</pre>
<p>
When there are several commands in one file, each command must be on separate line.
</p>
<h3>
<a name="comments" href="" id="comments"></a>Comments
</h3>
<p>
<i>Sed</i> comments are lines where the first non-white character is a "#." On many systems, <i>sed</i> can have only one comment, and it must be the first line of the script. On modern systems, you can have several comment lines anywhere in the script.
</p>
<h3>
<a name="multiplerulesandorderofexecution" href="" id="multiplerulesandorderofexecution"></a>Multiple rules and order of execution
</h3>
<p>
As we explore more of the commands of <i>sed</i>, the commands will become complex, and the actual sequence can be confusing. It's really quite simple. Each line is read in. The various commands, in order specified by the user, has a chance to operate on the input line. After the substitutions are made, the next command has a chance to operate on the same line, which may have been modified by earlier commands. If you ever have a question, the best way to learn what will happen is to just try it out. If a complex command doesn't work, make it simpler. If you are having problems getting a complex script working, break it up into two smaller scripts and do them one by one.
</p>
<h3>
<a name="addressesandrangesoftext" href="" id="addressesandrangesoftext"></a>Addresses and Ranges of Text
</h3>
<p>
You have only learned one command, and you can see how powerful <i>sed</i> is. However, all it is doing is a search and substitute. That is, the substitute command is treating each line by itself, without caring about nearby lines. What would be useful is the ability to restrict the operation to certain lines. Some useful restrictions might be:
</p>
<p>
- Specifying a line by its number.
</p>
<p>
- Specifying a range of lines by number.
</p>
<p>
- All lines containing a pattern.
</p>
<p>
- All lines from the beginning of a file to a regular expression.
</p>
<p>
- All lines from a regular expression to the end of the file.
</p>
<p>
- All lines between two regular expressions.<br />
</p>
<p>
<i>Sed</i> can do all that and more. Every command in <i>sed</i> can be assigned an address, range or restriction like the above examples. The restriction or address immediately precedes the command:
</p>
<pre>
<i>restriction</i> <i>command</i><br />
</pre>
<h3>
<a name="restrictingtoalinenumber" href="" id="restrictingtoalinenumber"></a>Restricting to a line number
</h3>
<p>
The simplest restriction is a line number. If you wanted to delete the first number on line 3, just add a "3" before the command:
</p>
<pre>
3 s/[0-9][0-9]*//<br />
</pre>
<h3>
<a name="patterns" href="" id="patterns"></a>Patterns
</h3>
<p>
Many Unix utilities use a slash to search for a regular expression. <i>Sed</i> uses the same convention, provided you terminate the expression with a slash. To delete the first number on all lines that start with a "#," use:
</p>
<pre>
/^#/ s/[0-9][0-9]*//<br />
</pre>
<p>
I placed a space after the "/<i>expression</i>/" so it is easier to read. It isn't necessary, but without it the command is harder to fathom. <i>Sed</i> does provide a few extra options when specifying regular expressions. But I'll discuss those later. If the expression starts with a backslash, the next character is the delimiter. To use a comma instead of a slash, use:
</p>
<pre>
\,^#, s/[0-9][0-9]*//<br />
</pre>
<p>
The main advantage of this feature is searching for slashes. Suppose you wanted to search for the string "/usr/local/bin" and you wanted to change it for "/common/all/bin." You could use the backslash to escape the slash:
</p>
<pre>
/\/usr\/local\/bin/ s/\/usr\/local/\/common\/all/<br />
</pre>
<p>
It would be easier to follow if you used an underline instead of a slash as a search. This example uses the underline in both the search command and the substitute command:
</p>
<pre>
\_/usr/local/bin_ s_/usr/local_/common/all_<br />
</pre>
<p>
This illustrates why <i>sed</i> scripts get the reputation for obscurity. I could be perverse and show you the example that will search for all lines that start with a "g," and change each "g" on that line to an "s:"
</p>
<pre>
/^g/s/g/s/g<br />
</pre>
<p>
Adding a space and using an underscore after the substitute command makes this <b>much</b> easier to read:
</p>
<pre>
/^g/ s_g_s_g<br />
</pre>
<p>
Er, I take that back. It's hopeless. There is a lesson here: Use comments liberally in a <i>sed</i> script. Comments are a Good Thing. You may have understood the script perfectly when you wrote it. But six months from now it could look like a confusing thing.
</p>
<h3>
<a name="rangesbylinenumber" href="" id="rangesbylinenumber"></a>Ranges by line number
</h3>
<p>
You can specify a range on line numbers by inserting a comma between the numbers. To restrict a substitution to the first 100 lines, you can use:
</p>
<pre>
1,100 s/A/a/<br />
</pre>
<p>
If you know exactly how many lines are in a file, you can explicitly state that number to perform the substitution on the rest of the file. In this case, assume that there are 532 lines in the file:
</p>
<pre>
101,532 s/A/a/<br />
</pre>
<p>
An easier way is to use the special character "$," which means the last line in the file.
</p>
<pre>
101,$ s/A/a/<br />
</pre>
<p>
The "$" is one of those conventions that mean "last" in several utilities.
</p>
<h3>
<a name="rangesbypatterns" href="" id="rangesbypatterns"></a>Ranges by patterns
</h3>
<p>
You can specify two regular expressions as the range. Assuming a "#" starts a comment, you can search for a keyword, remove all comments until you see the second keyword. In this case the two keywords are "start" and "stop:"
</p>
<pre>
/start/,/stop/ s/#.*//<br />
</pre>
<p>
The first pattern turns on a flag that tells <i>sed</i> to perform the substitute command on every line. The second pattern turns off the flag. If the "start" and "stop" pattern occurs twice, the substitution is done both times. If the "stop" pattern is missing, the flag is never turned off, and the substitution will be performed on every line until the end of the file.
</p>
<p>
You should know that if the "start" pattern is found, the substitution occurs on the same line that contains "start." This turns on a switch, which is line oriented. That is, the next line is read. If it contains "stop" the switch is turned off. Otherwise, the substitution command is tried. Switches are line oriented, and not word oriented.
</p>
<p>
You can combine line numbers and regular expressions. This example will remove comments from the beginning of the file until it finds the keyword "start:"
</p>
<pre>
1,/start/ s/#.*//<br />
</pre>
<p>
This example will remove comments everywhere except the lines <b>between</b> the two keywords:
</p>
<pre>
1,/start/ s/#.*//' -e '/stop/,$ s/#.*//<br />
</pre>
<p>
The last example has a range that overlaps the "/start/,/stop/" range, as both ranges operate on the lines that contain the keywords. I will show you later how to restrict a command up to, <b>but not including</b> the line containing the specified pattern.
</p>
<p>
Before I start discussing the various commands, should show know that some commands cannot operate on a range of lines. I will let you know when I mention the commands. I will describe three commands in this secton. One of the three cannot operate on a range.
</p>
<h3>
<a name="deletewithd" href="" id="deletewithd"></a>Delete with d
</h3>
<p>
Using ranges can be confusing, so you should expect to do some experimentation when you are trying out a new script. A useful command deletes every line that matches the restriction: "d." If you want to look at the first 10 lines of a file, you can use:
</p>
<pre>
11,$ d<br />
</pre>
<p>
which is similar in function to the <i>head</i> command. If you want to chop off the header of a mail message, which is everything up to the first blank line, use:
</p>
<pre>
1,/^$/ d<br />
</pre>
<p>
The range for deletions can be regular expressions pairs to mark the begin and end of the operation. Or it can be a single regular expression. Deleting all lines that start with a "#" is easy:
</p>
<pre>
/^#/ d<br />
</pre>
<p>
Removing comments and blank lines takes two commands. The first removes every character from the "#" to the end of the line, and the second deletes all blank lines:
</p>
<pre>
s/#.*//
</pre>
<pre>
/^$/ d<br />
</pre>
<p>
A third one should be added to remove all blanks and tabs immediately before the end of line:
</p>
<pre>
s/#.*//<br />s/[ ^I]*$//<br />/^$/ d<br />
</pre>
<p>
The character "^I" is a <i>CRTL-I</i> or tab character. You would have to explicitly type in the tab. Note the order of operations above, which is in that order for a very good reason. Comments might start in the middle of a line, with white space characters before them. Therefore comments are first removed from a line, potentially leaving white space characters that were before the comment. The second command removes all trailing blanks, so that lines that are now blank are converted to empty lines. The last command deletes empty lines. Together, the three commands removes all lines containing only comments, tabs or spaces.
</p>
<p>
This demonstrates the pattern space <i>sed</i> uses to operate on a line. The actual operation <i>sed</i> uses is:
</p>
<p>
- Copy the input line into the pattern space.
</p>
<p>
- Apply the first <i>sed</i> command on the pattern space, if the address restriction is true.
</p>
<p>
- Repeat with the next sed expression, again operating on the pattern space.
</p>
<p>
- When the last operation is performed, write out the pattern space and read in the next line from the input file.<br />
</p>
<h3>
<a name="reversingtherestrictionwith" href="" id="reversingtherestrictionwith"></a>Reversing the restriction with !
</h3>
<p>
Sometimes you need to perform an action on every line except those that match a regular expression, or those outside of a range of addresses. The "!" character, which often means not in Unix utilities, inverts the address restriction.
</p>
<h3>
<a name="theqorquitcommand" href="" id="theqorquitcommand"></a>The q or quit command
</h3>
<p>
There is one more simple command that can restrict the changes to a set of lines. It is the "q" command: quit. the third way to duplicate the head command is:
</p>
<pre>
11 q<br />
</pre>
<p>
which quits when the eleventh line is reached. This command is most useful when you wish to abort the editing after some condition is reached.
</p>
<p>
The "q" command is the one command that does not take a range of addresses. Obviously the command
</p>
<pre>
1,10 q<br />
</pre>
<p>
cannot quit 10 times. Instead
</p>
<pre>
1 q<br />
</pre>
<p>
or
</p>
<pre>
10 q
</pre>
<p>
is correct.
</p>
<h3>
<a name="groupingwithand" href="" id="groupingwithand"></a>Grouping with { and }
</h3>
<p>
The curly braces, "{" and "}," are used to group the commands.
</p>
<p>
Hardly worth the build up. All that prose and the solution is just matching squigqles. Well, there is one complication. Since each <i>sed</i> command must start on its own line, the curly braces and the nested <i>sed</i> commands must be on separate lines.
</p>
<p>
Previously, I showed you how to remove comments starting with a "#." If you wanted to restrict the removal to lines between special "begin" and "end" key words, you could use:<br />
</p>
<pre>
/begin/,/end/ {<br />s/#.*//<br />s/[ ^I]*$//<br />/^$/ d<br />p<br />}
</pre>
<p>
These braces can be nested, which allow you to combine address ranges. You could perform the same action as before, but limit the change to the first 100 lines:<br />
</p>
<pre>
1,100 {<br />/begin/,/end/ {<br />s/#.*//<br />s/[ ^I]*$//<br />/^$/ d<br />p<br />}<br />}
</pre>
<p>
You can place a "!" before a set of curly braces. This inverts the address, which removes comments from all lines <b>except</b> those between the two reserved words:<br />
</p>
<pre>
/begin/,/end/ !{<br />s/#.*//<br />s/[ ^I]*$//<br />/^$/ d<br />p<br />}
</pre>
<h3>
<a name="thecommentcommand" href="" id="thecommentcommand"></a>The # Comment Command
</h3>
<p>
As we dig deeper into <i>sed</i>, comments will make the commands easier to follow. Older versions of <i>sed</i> only allow one line as a comment, and it must be the first line. Recent versions allow more than one comment, and these comments don't have to be first. An example could be:
</p>
<h3>
<a name="addingchanginginsertingnewlines" href="" id="addingchanginginsertingnewlines"></a>Adding, Changing, Inserting new lines
</h3>
<p>
<i>Sed</i> has three commands used to add new lines to the output stream. Because an entire line is added, the new line is on a line by itself to emphasize this. There is no option, an entire line is used, and it must be on its own line. If you are familiar with many unix utilities, you would expect <i>sed</i> to use a similar convention: lines are continued by ending the previous line with a "\". The syntax to these commands are finicky, like the "r" and "w" commands.
</p>
<h4>
<a name="appendalinewitha" href="" id="appendalinewitha"></a>Append a line with 'a'
</h4>
<p>
The "a" command appends a line after the range or pattern. This example will add a line after every line with "WORD:
</p>
<pre>
/WORD/ a\
</pre>
<pre>
Add this line after every line with WORD
</pre>
<h4>
<a name="insertalinewithi" href="" id="insertalinewithi"></a>Insert a line with 'i'
</h4>
<p>
You can insert a new line before the pattern with the "i" command:
</p>
<pre>
/WORD/ i\
</pre>
<pre>
Add this line before every line with WORD
</pre>
<h4>
<a name="changealinewithc" href="" id="changealinewithc"></a>Change a line with 'c'
</h4>
<p>
You can change the current line with a new line.
</p>
<pre>
/WORD/ c\
</pre>
<pre>
Replace the current line with the line
</pre>
<p>
A "d" command followed by a "a" command won't work, as I discussed earlier. The "d" command would terminate the current actions. You can combine all three actions using curly braces:<br />
</p>
<pre>
/WORD/ {<br />i\<br />Add this line before<br />a\<br />Add this line after<br />c\<br />Change the line to this one<br />}
</pre>
<h3>
<a name="leadingtabsandspacesinasedscript" href="" id="leadingtabsandspacesinasedscript"></a>Leading tabs and spaces in a sed script
</h3>
<p>
<i>Sed</i> ignores leading tabs and spaces in all commands. However these white space characters may or may not be ignored if they start the text following a "a," "c" or "i" command. If you want to keep leading spaces, and not care about which version of <i>sed</i> you are using, put a "\" as the first character of the line:
</p>
<pre>
a\
</pre>
<pre>
\ This line starts with a tab
</pre>
<h3>
<a name="addingmorethanoneline" href="" id="addingmorethanoneline"></a>Adding more than one line
</h3>
<p>
All three commands will allow you to add more than one line. Just end each line with a "\:"
</p>
<pre>
/WORD/ a\<br />Add this line\<br />This line\<br />And this line
</pre>
<h3>
<a name="addinglinesandthepatternspace" href="" id="addinglinesandthepatternspace"></a>Adding lines and the pattern space
</h3>
<p>
I have mentioned the pattern space before. Most commands operate on the pattern space, and subsequent commands may act on the results of the last modification. The three previous commands, like the read file command, add the new lines to the output stream, bypassing the pattern space.
</p>
<h3>
<a name="addressrangesandtheabovecommands" href="" id="addressrangesandtheabovecommands"></a>Address ranges and the above commands
</h3>
<p>
Some commands can take a range of lines, and others cannot. To be precise, the commands "a," "i," "r," and "q" will not take a range like "1,100" or "/begin/,/end/." The "c" or change command allows this, and it will let you change several lines into one:
</p>
<pre>
/begin/,/end/ c\<br />***DELETED***<br />
</pre>
<p>
If you need to do this, you can use the curly braces, as that will let you perform the operation on every line:
</p>
<pre>
1,$ {
</pre>
<pre>
a\
</pre>
<pre>
}
</pre>
<h3>
<a name="multilinepatterns" href="" id="multilinepatterns"></a>Multi-Line Patterns
</h3>
<p>
Most UNIX utilities are line oriented. Regular expressions are line oriented. Searching for patterns that covers more than one line is not an easy task. (Hint: It will be very shortly.)
</p>
<p>
<i>Sed</i> reads in a line of text, performs commands which may modify the line, and outputs modification if desired. The main loop of a <i>sed</i> script looks like this:
</p>
<p>
1. The next line is read from the input file and placed in the pattern space. If the end of file is found, and if there are additional files to read, the current file is closed, the next file is opened, and the first line of the new file is placed into the pattern space.
</p>
<p>
2. The line count is incremented by one.
</p>
<p>
3. Each <i>sed</i> command is examined. If there is a restriction placed on the command, and the current line in the pattern space meets that restriction, the command is executed. Some commands, like "n" or "d" cause <i>sed</i> to go to the top of the loop. The "q" command causes <i>sed</i> to stop. Otherwise the next command is examined.
</p>
<p>
3. After all of the commands are examined, the pattern space is output.
</p>
<p>
The restriction before the command determines if the command is executed. If the restriction is a pattern, and the operation is the delete command, then the following will delete all lines that have the pattern:
</p>
<pre>
/PATTERN/ d<br />
</pre>
<p>
If the restriction is a pair of numbers, then the deletion will happen if the line number is equal to the first number or greater than the first number and less than or equal to the last number:
</p>
<pre>
10,20 d<br />
</pre>
<p>
If the restriction is a pair of patterns, there is a variable that is kept for each of these pairs. If the variable is false and the first pattern is found, the variable is made true. If the variable is true, the command is executed. If the variable is true, and the last pattern is on the line, after the command is executed the variable is turned off:
</p>
<pre>
/begin/,/end/ d<br />
</pre>
<p>
Whew! That was a mouthful. Hey, I said it was a review. If you have read the previous lessons, you should have breezed through this. You may want to refer back to this review, because I covered several subtle points.
</p>
<h3>
<a name="transformwithy" href="" id="transformwithy"></a>Transform with y
</h3>
<p>
If you wanted to change a word from lower case to upper case, you could write 26 character substitutions, converting "a" to "A," etc. <i>Sed</i> has a command that operates like the <i>tr</i> program. It is called the "y" command. For instance, to change the letters "a" through "f" into their upper case form, use:
</p>
<pre>
y/abcdef/ABCDEF/<br />
</pre>
<p>
I could have used an example that converted all 26 letters into upper case, and while this column covers a broad range of topics, the "column" prefers a narrower format.
</p>
<p>
If you wanted to convert a line that contained a hexadecimal number (e.g. 0x1aff) to upper case (0x1AFF), you could use:
</p>
<pre>
/0x[0-9a-zA-Z]*/ y/abcdef/ABCDEF<br />
</pre>
<p>
This works fine if there are only numbers in the file. If you wanted to change the second word in a line to upper case, you are out of luck - unless you use multi-line editing.
</p>
<h3>
<a name="displayingcontrolcharacterswithal" href="" id="displayingcontrolcharacterswithal"></a>Displaying control characters with a l
</h3>
<p>
The "l" command prints the current pattern space. It is therefore useful in debugging <i>sed</i> scripts. It also converts unprintable characters into printing characters by outputting the value in octal preceded by a "\" character.
</p>
<h3>
<a name="theholdbuffer" href="" id="theholdbuffer"></a>The Hold Buffer
</h3>
<p>
So far we have talked about three concepts of <i>sed</i>: (1) The input stream or data before it is modified, (2) the output stream or data after it has been modified, and (3) the pattern space, or buffer containing characters that can be modified and send to the output stream.
</p>
<p>
There is one more "location" to be covered: the <i>hold buffer</i> or <i>hold space</i>. Think of it as a spare pattern buffer. It can be used to "copy" or "remember" the data in the pattern space for later. There are five commands that use the hold buffer.
</p>
<h3>
<a name="exchangewithx" href="" id="exchangewithx"></a>Exchange with x
</h3>
<p>
The "x" command eXchanges the pattern space with the hold buffer. By itself, the command isn't useful. Executing the <i>sed</i> command
</p>
<pre>
x<br />
</pre>
<p>
as a filter adds a blank line in the front, and deletes the last line. It looks like it didn't change the input stream significantly, but the <i>sed</i> command is modifying every line.
</p>
<p>
The hold buffer starts out containing a blank line. When the "x" command modifies the first line, line 1 is saved in the hold buffer, and the blank line takes the place of the first line. The second "x" command exchanges the second line with the hold buffer, which contains the first line. Each subsequent line is exchanged with the preceding line. The last line is placed in the hold buffer, and is not exchanged a second time, so it remains in the hold buffer when the program terminates, and never gets printed. This illustrates that care must be taken when storing data in the hold buffer, because it won't be output unless you explicitly request it.
</p>
<h3>
<a name="holdwithhorh" href="" id="holdwithhorh"></a>Hold with h or H
</h3>
<p>
The "x" command exchanges the hold buffer and the pattern buffer. Both are changed. The "h" command copies the pattern buffer into the hold buffer. The pattern buffer is unchanged.
</p>
<h3>
<a name="flowcontrol" href="" id="flowcontrol"></a>Flow Control
</h3>
<p>
As you learn about <i>sed</i> you realize that it has it's own programming language. It is true that it's a very specialized and simple language. What language would be complete without a method of changing the flow control? There are three commands <i>sed</i> uses for this. You can specify a label with an text string followed by a colon. The "b" command branches to the label. The label follows the command. If no label is there, branch to the end of the script. The "t" command is used to test conditions.
</p>
<h3>
<a name="testingwitht" href="" id="testingwitht"></a>Testing with t
</h3>
<p>
You can execute a branch if a pattern is found. You may want to execute a branch only if a substitution is made. The command "t label" will branch to the label if the last substitute command modified the pattern space.
</p>
<p>
One use for this is recursive patterns. Suppose you wanted to remove white space inside parenthesis. These parentheses might be nested. That is, you would want to delete a string that looked like "( ( ( ())) )." The <i>sed</i> expressions
</p>
<pre>
s/([ ^I]*)/g<br />
</pre>
<p>
would only remove the innermost set. You would have to pipe the data through the script four times to remove each set or parenthesis. You could use the regular expression
</p>
<pre>
s/([ ^I()]*)/g<br />
</pre>
<p>
but that would delete non-matching sets of parenthesis. The "t" command would solve this:<br />
</p>
<pre>
:again<br />s/([ ^I]*)//g<br />t again
</pre>
<h3>
<a name="passingregularexpressionsasarguments" href="" id="passingregularexpressionsasarguments"></a>Passing regular expressions as arguments
</h3>
<p>
In the earlier scripts, I mentioned that you would have problems if you passed an argument to the script that had a slash in it. In fact, and regular expression might cause you problems. A script like the following is asking to be broken some day:
</p>
<pre>
s/'"$1"'//g
</pre>If the argument contains any of these characters in it you may get a broken script: "/\.*[]^$." One solution is to have the user put a backslash before any of these characters when they pass it as an argument. However, the user has to know which characters are special.
</div>
</body>
</html>
|