php - Regex match for hyperlink -
i wanted match hyperlink different attributes href, rel, target, media. looking regex containing these attributes (rel, media optional).
as inserting code parser, cannot afford use dom class, looking solution regex.
let me take below example explain:
<a href="http://www.google.com" rel="nofollow" target="_blank">google</a> <a href="http://www.google.com" rel="follow" target="_blank">google</a> <a href="http://www.google.com" target="_blank">google</a> this got now
/<a\s?(href=)?('|")(.*)('|") (rel='|")(nofollow|follow)('|") target=('|")_blank('|") (media='|")(.*?)('|")>(.*)<\/a>/
here solution php's domdocument class. incorporated logic check required / optional attributes:
// load html $doc = new domdocument; $doc->loadhtml( $html); // define attributes looking in name => required pairs $attributes = array( 'href' => true, 'rel' => false, 'target' => true, 'media' => false); $parsed_tags = array(); // iterate on of <a> tags foreach( $doc->getelementsbytagname( 'a') $a) { $tag_attributes = array(); foreach( $attributes $name => $required) { if( !$a->hasattribute( $name)) { if( $required) { echo 'error, tag required have ' . $name . ' attribute , missing' . "\n"; continue 2; } } else { // has attribute, required or not lets grab $tag_attributes[$name] = $a->getattribute( $name); } } $parsed_tags[] = $tag_attributes; } with html string:
$html = '<a href="http://www.google.com" rel="nofollow" target="_blank">google</a><a href="http://www.google.com" rel="follow" target="_blank">google</a><a href="http://www.google.com" target="_blank">google</a>'; this produces:
array ( [0] => array ( [href] => http://www.google.com [rel] => nofollow [target] => _blank ) [1] => array ( [href] => http://www.google.com [rel] => follow [target] => _blank ) [2] => array ( [href] => http://www.google.com [target] => _blank ) ) note solution, because i'm checking if required attributes present , doing continue 2; if aren't means <a> tags without required attributes skipped, seen in this demo, tag <a href="http://www.google.com">google</a> outputs error string put in, does not included in output array.
Comments
Post a Comment