html - Use perl regular expression to match a pattern that spans multiple lines in SAS -
i have html snippet stored in html.html:
</head> <body> <h4>areas of interest</h4> <ul> <li>interest</li> <li>interest</li> </ul> <h4>other</h4> </body>
i have build regex pattern (/<\/h4>(\w*\w*)*<h4>/)
matches pattern extract between first </h4>
tag , second <h4>
tag, how sas search pattern across multiple lines?
sas code:
filename html; data interests (drop=pattern); length string $2500; if _n_ = 1 pattern = prxparse("/<\/h4>(\w*\w*)*<h4>/"); retain pattern; infile html lrecl=2500; string $char2500.; if prxmatch(pattern,string) gt 0 output; run;
currently getting blank results.
i don't think can way in sas.
sas sees each row of html separate observation; if want parse them whole you'd need logic join them together. follows example works on sample code, fail on tons of special cases. commenters have indicated, might better off using html parser if can.
this example starts (or restarts) line concatenation when sees <h4>
, outputs concatenated line when sees </h4>
.
data interests (keep=multiline); length multiline $250; set html; string = trim(string); retain multiline; multiline = cats(multiline,string); if find(string,'<h4>') > 0 multiline = string; if find(string,'</h4>') > 0 do; output; multiline = ''; end; run;
Comments
Post a Comment