Recursive patterns

Consider the problem of matching a string in parentheses, allowing for unlimited nested parentheses. Without the use of recursion, the best that can be done is to use a pattern that matches up to some fixed depth of nesting. It is not possible to handle an arbitrary nesting depth. Perl 5.6 has provided an experimental facility that allows regular expressions to recurse (among other things). The special item (?R) is provided for the specific case of recursion. This PCRE pattern solves the parentheses problem (assume the PCRE_EXTENDED option is set so that white space is ignored): \( ( (?>[^()]+) | (?R) )* \)

First it matches an opening parenthesis. Then it matches any number of substrings which can either be a sequence of non-parentheses, or a recursive match of the pattern itself (i.e. a correctly parenthesized substring). Finally there is a closing parenthesis.

This particular example pattern contains nested unlimited repeats, and so the use of a once-only subpattern for matching strings of non-parentheses is important when applying the pattern to strings that do not match. For example, when it is applied to (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa() it yields "no match" quickly. However, if a once-only subpattern is not used, the match runs for a very long time indeed because there are so many different ways the + and * repeats can carve up the subject, and all have to be tested before failure can be reported.

The values set for any capturing subpatterns are those from the outermost level of the recursion at which the subpattern value is set. If the pattern above is matched against (ab(cd)ef) the value for the capturing parentheses is "ef", which is the last value taken on at the top level. If additional parentheses are added, giving \( ( ( (?>[^()]+) | (?R) )* ) \) then the string they capture is "ab(cd)ef", the contents of the top level parentheses. If there are more than 15 capturing parentheses in a pattern, PCRE has to obtain extra memory to store data during a recursion, which it does by using pcre_malloc, freeing it via pcre_free afterwards. If no memory can be obtained, it saves data for the first 15 capturing parentheses only, as there is no way to give an out-of-memory error from within a recursion.

(?1), (?2) and so on can be used for recursive subpatterns too. It is also possible to use named subpatterns: (?P>name) or (?&name).

If the syntax for a recursive subpattern reference (either by number or by name) is used outside the parentheses to which it refers, it operates like a subroutine in a programming language. An earlier example pointed out that the pattern (sens|respons)e and \1ibility matches "sense and sensibility" and "response and responsibility", but not "sense and responsibility". If instead the pattern (sens|respons)e and (?1)ibility is used, it does match "sense and responsibility" as well as the other two strings. Such references must, however, follow the subpattern to which they refer.

The maximum length of a subject string is the largest positive number that an integer variable can hold. However, PCRE uses recursion to handle subpatterns and indefinite repetition. This means that the available stack space may limit the size of a subject string that can be processed by certain patterns.

Here you can write a comment


Please enter at least 10 characters.
Loading... Please wait.
* Pflichtangabe
There are no comments available yet.

PHP cURL Tutorial: Using cURL to Make HTTP Requests

cURL is a powerful PHP extension that allows you to communicate with different servers using various protocols, including HTTP, HTTPS, FTP, and more. ...

TheMax

Autor : TheMax
Category: PHP-Tutorials

Midjourney Tutorial - Instructions for beginners

There is an informative video about Midjourney, the tool for creating digital images using artificial intelligence, entitled "Midjourney tutorial in German - instructions for beginners" ...

Mike94

Autor : Mike94
Category: KI Tutorials

Basics of views in MySQL

Views in a MySQL database offer the option of creating a virtual table based on the result of an SQL query. This virtual table can be queried like a normal table without changing the underlying data. ...

admin

Autor : admin
Category: mySQL-Tutorials

Publish a tutorial

Share your knowledge with other developers worldwide

Share your knowledge with other developers worldwide

You are a professional in your field and want to share your knowledge, then sign up now and share it with our PHP community

learn more

Publish a tutorial