javascript - Get all links from html page using regex -


i'm using google apps script fetch content of emails gmail , after need extract of links html tags. found code here, on stackoverflow, , implemented regular expression, issue is returning me first url. (http://vacante2016.eu/tr/17599/51743713/c4f5eadf38eb475d39e3cdeca9201538)

is there way make loop search next content matches regex expression display of elements 1 one?

here can see example content of email need links from: https://www.mailinator.com/inbox2.jsp?public_to=get_urls#/#public_showmaildiv

this code:

function geturl() {    var threads = gmailapp.getinboxthreads();   var message = threads[0].getmessages()[0];   var content = message.getrawcontent();      var source = (content || '').tostring();     var urlarray = [];     var url;     var matcharray;      // regular expression find ftp, http(s) urls.     var regextoken = /(http|https|ftp|ftps)\:\/\/[a-za-z0-9\-\.]+\.[a-za-z]{2,3}(\/\s*)?/;      // iterate through urls in text.     while( (matcharray = regextoken.exec( source )) !== null )     {       var token = matcharray[0];       urlarray.push( token );     } } 

update: changed regex /(?:ht|f)tps?\:\/\/[a-za-z0-9\-.]+\.[a-za-z]{2,3}(\/[\s=]*)?/g improved things following type of response when search urls: "http://vacante2016.eu/clk/17599/5=\r\n1743713/150132/bf7639dd7e7aa48c9197a52a8c61e168\"><img" ... think regex should have condition return url > symbol.

also, there way remove additional characters =, \r , \n found url?

you need use global modifier /g multiple matches regexp#exec.

besides, since input html code, need make sure not grab < \s:

/(?:ht|f)tps?:\/\/[-a-za-z0-9.]+\.[a-za-z]{2,3}(\/[^"<]*)?/g 

see regex demo.

if reason pattern not match equal signs, add alternative:

/(?:ht|f)tps?:\/\/[-a-za-z0-9.]+\.[a-za-z]{2,3}(?:\/(?:[^"<=]|=)*)?/g 

see another demo (however, first 1 should do).


Comments

Popular posts from this blog

Spring Boot + JPA + Hibernate: Unable to locate persister -

go - Golang: panic: runtime error: invalid memory address or nil pointer dereference using bufio.Scanner -

c - double free or corruption (fasttop) -