Comparing two HTML pages?

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
nvedia
Posts: 3
Joined: 2014-01-31T11:33:17-07:00
Authentication code: 6789

Comparing two HTML pages?

Post by nvedia »

Can I use imagemagick to compare two html pages as images and find the similarity %?

Basically I have lot of pages(which would be rendered with html), which needs to be compared with the ones provided screens(by user)
Examples
1. Question/answer page with 4 radio buttons --> Compare all text is same, all 4 radio buttons are vertical so if in other image they are horizontally placed, it should reduce the similarity etc
2. Page with multiple images and the page to be compared will have all the images. --> It should ignore minor noise, background differences etc

I was thinking of converting html page to image and then apply sift comparison algorithm

Any other ideas?
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Comparing two HTML pages?

Post by snibgo »

Any other ideas?
Why not compare the HTML?
snibgo's IM pages: im.snibgo.com
nvedia
Posts: 3
Joined: 2014-01-31T11:33:17-07:00
Authentication code: 6789

Re: Comparing two HTML pages?

Post by nvedia »

Because the screens provided don't have html and again html may not match...(things like image absolute, relative path, script tag etc etc)
a html page can have 5 images on the page and without doing image comparison, how can we say those are same images visually?

Let me put it simply
Given two images and to find out how similar they are. Is it worth using imagemagick with sift kind of algorithm or OpenCV?
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Comparing two HTML pages?

Post by snibgo »

See http://www.imagemagick.org/script/comma ... php#metric for the metrics IM can find. It needs the images to be the same size. If necessary, you can change the sizes first (by resize, extent etc).

It does pixel-by-pixel comparisons. So if one page has an extra line of text so everything after it is shuffled down, IM will report a large difference if the extra line was at the top, or a small difference if it was at the bottom.
snibgo's IM pages: im.snibgo.com
Post Reply