Shell script to fetch value of a node appearing multiple times in an XML -
i have xml below:
<artifact> <a>1.zip</a> <b>2-snapshot.zip</b> <c>3-snapshot.zip</c> </artifact> <artifact> <a>4.tar</a> <b>5.tar</b> <c>6.tar</c> </artifact>
my requirement fetch value "5.tar" coming in 2nd appearance of node "artifact". able fetch value if node present once in xml. however, if same node appearing twice or multiple times in same xml, not able fetch it.
please help.
i break down answer tried using xmllint
$ echo "cat //root/artifact/b" | xmllint --shell buildresult.xml | sed '/^\/ >/d' | sed 's/<[^>]*.//g' | tr -d '\n' | awk -f"-------" '{print $2}' 5.tar
i have formatted original buildresult.xml
file adding <root>
nodes , adding proprietary header information, avoid parsing errors:-
$ xmllint -format buildresult.xml <?xml version="1.0" standalone="yes"?> <root> <artifact> <a>1.zip</a> <b>2-snapshot.zip</b> <c>3-snapshot.zip</c> </artifact> <artifact> <a>4.tar</a> <b>5.tar</b> <c>6.tar</c> </artifact> </root>
the steps executed:-
starting file parsing root-node repeating node (//root/artifact/b
) , running xmllint
in interactive shell mode (xmllint --shell
)
running command plainly produces result,
/ > ------- <b>2-snapshot.zip</b> ------- <b>5.tar</b> / >
now removing special characters using sed
i.e. sed '/^\/ >/d' | sed 's/<[^>]*.//g'
produces
2-snapshot.zip ------- 5.tar
now removing newlines above command using tr
awk
can process records using field separator -------
2-snapshot.zip -------5.tar
the awk
command on above output produce file needed; awk -f"-------" '{print $2}
5.tar
putting in shell script, looks like
#!/bin/bash newvar=$(echo "cat //root/artifact/b" | xmllint --shell buildresult.xml | sed '/^\/ >/d' | sed 's/<[^>]*.//g' | tr -d '\n' | awk -f"-------" '{print $2}') echo "$newvar"
p.s:- number of commands can reduced/simplified reduced number of awk
/sed
command combination. solution works.
Comments
Post a Comment