regexp 2.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129
  1. .TH REGEXP 6
  2. .SH NAME
  3. regexp \- regular expression notation
  4. .SH DESCRIPTION
  5. A
  6. .I "regular expression"
  7. specifies
  8. a set of strings of characters.
  9. A member of this set of strings is said to be
  10. .I matched
  11. by the regular expression. In many applications
  12. a delimiter character, commonly
  13. .LR / ,
  14. bounds a regular expression.
  15. In the following specification for regular expressions
  16. the word `character' means any character (rune) but newline.
  17. .PP
  18. The syntax for a regular expression
  19. .B e0
  20. is
  21. .IP
  22. .EX
  23. e3: literal | charclass | '.' | '^' | '$' | '(' e0 ')'
  24. e2: e3
  25. | e2 REP
  26. REP: '*' | '+' | '?'
  27. e1: e2
  28. | e1 e2
  29. e0: e1
  30. | e0 '|' e1
  31. .EE
  32. .PP
  33. A
  34. .B literal
  35. is any non-metacharacter, or a metacharacter
  36. (one of
  37. .BR .*+?[]()|\e^$ ),
  38. or the delimiter
  39. preceded by
  40. .LR \e .
  41. .PP
  42. A
  43. .B charclass
  44. is a nonempty string
  45. .I s
  46. bracketed
  47. .BI [ \|s\| ]
  48. (or
  49. .BI [^ s\| ]\fR);
  50. it matches any character in (or not in)
  51. .IR s .
  52. A negated character class never
  53. matches newline.
  54. A substring
  55. .IB a - b\f1,
  56. with
  57. .I a
  58. and
  59. .I b
  60. in ascending
  61. order, stands for the inclusive
  62. range of
  63. characters between
  64. .I a
  65. and
  66. .IR b .
  67. In
  68. .IR s ,
  69. the metacharacters
  70. .LR - ,
  71. .LR ] ,
  72. an initial
  73. .LR ^ ,
  74. and the regular expression delimiter
  75. must be preceded by a
  76. .LR \e ;
  77. other metacharacters
  78. have no special meaning and
  79. may appear unescaped.
  80. .PP
  81. A
  82. .L .
  83. matches any character.
  84. .PP
  85. A
  86. .L ^
  87. matches the beginning of a line;
  88. .L $
  89. matches the end of the line.
  90. .PP
  91. The
  92. .B REP
  93. operators match zero or more
  94. .RB ( * ),
  95. one or more
  96. .RB ( + ),
  97. zero or one
  98. .RB ( ? ),
  99. instances respectively of the preceding regular expression
  100. .BR e2 .
  101. .PP
  102. A concatenated regular expression,
  103. .BR "e1\|e2" ,
  104. matches a match to
  105. .B e1
  106. followed by a match to
  107. .BR e2 .
  108. .PP
  109. An alternative regular expression,
  110. .BR "e0\||\|e1" ,
  111. matches either a match to
  112. .B e0
  113. or a match to
  114. .BR e1 .
  115. .PP
  116. A match to any part of a regular expression
  117. extends as far as possible without preventing
  118. a match to the remainder of the regular expression.
  119. .SH "SEE ALSO"
  120. .IR awk (1),
  121. .IR ed (1),
  122. .IR grep (1),
  123. .IR sam (1),
  124. .IR sed (1),
  125. .IR regexp (2)