Pregunta de entrevista de IBM

Programming Challenge Description: Crawl an HTML page to extract file names containing certain patterns from hyperlinks Given a one line HTML "page", find all hyperlinks (e.g., URLs specified within <a href=http://sample.url/> and not as plain text) and return all the names of zip files that contain word "data" and come from www.example.com web site. The output should be a comma-separated list without spaces. If no such file was detected, print empty line (e.g., “”). Example: Input: <a href=http://www.example.com/global_data.zip > some text </a> Some other text <a href=http://www.example.com/local_data.zip> some more text </a> Output: global_data.zip,local_data.zip Input: Your program should read lines of text from standard input. Each line may or may not contain one or more target files. Output: Print to standard output a single line containing a comma-separated list of such file names without spaces. If no such file was detected, print empty line (e.g., “”). Test 1 Test Input  <a href="http://www.example.com/files/world_data1.zip"><b>World Data Part 1</b></a> <br/> <a href="http://www.example.com/files/world_data2.zip"><b>World Data Part 2</b></a> Expected Output world_data1.zip,world_data2.zip Test 2 Test Input <td background="./files/buttonbg.gif"><a href="http://www.example.com/global_data.zip" onmouseover="setOverImg('11',''); Expected Output: global_data.zip