javascript - Get all links from html page using regex -
i'm using google apps script fetch content of emails gmail , after need extract of links html tags. found code here, on stackoverflow, , implemented regular expression, issue is returning me first url. (http://vacante2016.eu/tr/17599/51743713/c4f5eadf38eb475d39e3cdeca9201538
)
is there way make loop search next content matches regex expression display of elements 1 one?
here can see example content of email need links from: https://www.mailinator.com/inbox2.jsp?public_to=get_urls#/#public_showmaildiv
this code:
function geturl() { var threads = gmailapp.getinboxthreads(); var message = threads[0].getmessages()[0]; var content = message.getrawcontent(); var source = (content || '').tostring(); var urlarray = []; var url; var matcharray; // regular expression find ftp, http(s) urls. var regextoken = /(http|https|ftp|ftps)\:\/\/[a-za-z0-9\-\.]+\.[a-za-z]{2,3}(\/\s*)?/; // iterate through urls in text. while( (matcharray = regextoken.exec( source )) !== null ) { var token = matcharray[0]; urlarray.push( token ); } }
update: changed regex /(?:ht|f)tps?\:\/\/[a-za-z0-9\-.]+\.[a-za-z]{2,3}(\/[\s=]*)?/g
improved things following type of response when search urls: "http://vacante2016.eu/clk/17599/5=\r\n1743713/150132/bf7639dd7e7aa48c9197a52a8c61e168\"><img"
... think regex should have condition return url
>
symbol.
also, there way remove additional characters =
, \r
, \n
found url?
you need use global modifier /g
multiple matches regexp#exec
.
besides, since input html code, need make sure not grab <
\s
:
/(?:ht|f)tps?:\/\/[-a-za-z0-9.]+\.[a-za-z]{2,3}(\/[^"<]*)?/g
see regex demo.
if reason pattern not match equal signs, add alternative:
/(?:ht|f)tps?:\/\/[-a-za-z0-9.]+\.[a-za-z]{2,3}(?:\/(?:[^"<=]|=)*)?/g
see another demo (however, first 1 should do).
Comments
Post a Comment