html - Use perl regular expression to match a pattern that spans multiple lines in SAS -


i have html snippet stored in html.html:

</head> <body>   <h4>areas of interest</h4>   <ul>     <li>interest</li>        <li>interest</li>     </ul>   <h4>other</h4> </body> 

i have build regex pattern (/<\/h4>(\w*\w*)*<h4>/) matches pattern extract between first </h4> tag , second <h4> tag, how sas search pattern across multiple lines?

sas code:

filename html;   data interests (drop=pattern);    length string $2500;    if _n_ = 1       pattern = prxparse("/<\/h4>(\w*\w*)*<h4>/");    retain pattern;    infile html lrecl=2500;    string $char2500.;    if prxmatch(pattern,string) gt 0 output; run; 

currently getting blank results.

i don't think can way in sas.

sas sees each row of html separate observation; if want parse them whole you'd need logic join them together. follows example works on sample code, fail on tons of special cases. commenters have indicated, might better off using html parser if can.

this example starts (or restarts) line concatenation when sees <h4> , outputs concatenated line when sees </h4>.

data interests (keep=multiline);     length multiline $250;     set html;     string = trim(string);     retain multiline;     multiline = cats(multiline,string);     if find(string,'<h4>') > 0 multiline = string;     if find(string,'</h4>') > 0 do;         output;         multiline = '';     end; run; 

Comments

Popular posts from this blog

javascript - Using jquery append to add option values into a select element not working -

Android soft keyboard reverts to default keyboard on orientation change -

Rendering JButton to get the JCheckBox behavior in a JTable by using images does not update my table -