Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Zebra TesterZebraTester's Page Scanner function browses and explores web pages of a web server automatically in a recursive way - similar to a Web Spider or a Web Crawler.

Page Scanner's Purpose

Primary

...

: To turn a "normal" web surfing session

...

into a load test program

...

. This provides a simplified way to create a web surfing session

...

instead of recording single web pages manually.

However, Page Scanner can only be used to acquire web surfing sessions that do not require HTML form-based authentication. This tool is not a replacement for recording web surfing sessions of real web applications.

Other

...

: Page Scanner allows the detection of broken links inside a

...

website and provides statistical data about the largest and slowest web pages. It also supports searching for text fragments overall scanned web pages.

Info

Note 1: Page Scanner does not interpret JavaScript code and does not submit forms. Only hyperlinks are considered. Cookies are automatically supported.

Info

Note 2: Page Scanner keeps the entire scanned

...

website in its transient memory (RAM) in compressed form. This means that large

...

websites can be scanned, but it also means that transient memory is not unlimited.

Note

Please note that the Page Scanner tool may return no result

...

or

...

return an incomplete result because some

...

websites or web pages contain malformed HTML code

...

or because old, unusual HTML options have been used within the scanned web pages. Although this tool has been intensively tested, we

...

cannot provide any warranty for error-free behavior. Possible

...

website--or webpage-related errors--may be impossible to fix because of divergent requirements

...

or

...

complexity. The functionality and behavior

...

are similar to other search engines, which also have

...

similar restrictions.

Overview

...

GUI Display

...

The window is divided into two parts.

Scan Result: The upper part of the window shows the progress of the scan, scan's progress or the scan results when the scan it has been completed.

...

Page Scanner Input Parameter: The lower part of the window allows the setting of scan input parameters and starting a scan.

...

Page Scanner Parameter Inputs

Starting Web Page

The scan starts from this URL. Optionally, scan only parts of a website by entering a deep-linked URL path; for example, http://www.example.com/sales/customers.html. In this case, only web pages below, or at, the same level of the URL path are scanned.

Char Encoding

The default value, Auto Detect, can be overridden in case some or all web pages are wrongly coded, such that the HTML header-specified character set does not match the character set which is actually used within the HTML body of the web pages (malformed HTML at server-side). You can try ISO-8859-1 or UTFas a workaround if Page Scanner cannot extract hyperlinks (succeeding web pages) from the starting web page.

Exclude Path Patterns

Excludes one or more URL path patterns from scanning. Commas separate the path patterns.

Follow Web Servers

Include content and web pages from other web servers within the scan; for example, when images embedded in the web pages are located on another web server. Enter several additional web servers, separated by commas. Example: http://www.example.com , https://imgsrv.example.com:444. The protocol (HTTP or HTTPS), the hostname (usually www), the domain, and the TCP/IP port are considered, but URL paths are NOT considered.

Verify External Links

Verify all external links to all other web servers. This is commonly used to detect broken hyperlinks to other web servers.

Include

Affects which sets of embedded content types should also be included in the scan. Page Scanner uses the URL paths' file extensions to determine the content type (if available) because this can be done before the hyperlink of the embedded content itself is processed. This saves execution time, but it might affect a few URLs for excluded content types flow into the result from scanning because the MIME type of the received HTTP response headers is only used in detecting web pages. Remove these unwanted URLs after the scan has been completed by using the "remove URL" form in the Display Result window.

...

Content-Type Sets

Corresponding File Extensions

Images, Flash, CSS, JS

.img.bmp.gif.pct.pict.png.jpg.jpeg.tif.tiff.tga.ico.swf.stream.css.stylesheet.js.javascript

PDF Documents

.pdf

Office Documents

.doc.ppt.pps.xls.mdb.wmf.rtf.wri.vsd.rtf.rtx

ASCII Text Files

.txt.text.log.asc.ascii.cvs

Music and Movies

.mp2.mp3.mpg.avi.wav.avi.mov.wm.rm.mpeg

Binary Files

.exe.msi.dll.bat.com.pif.dat.bin.vcd.sav

Include Options

Allows you to select or de-select specific file extensions using the keywords -add or -remove.

Example: 

-remove .gif -add .mp2

Max Scan Time

Limits the maximum scan time in minutes. The scan will be stopped if this time is exceeded.

Max Web Pages

Limits the maximum number of scanned web pages. The scan will be stopped if the maximum number of web pages is exceeded.

Max Received Bytes

Limits the maximum size of the received data (in megabytes), measured over the entire scan. The scan will be stopped if the maximum size of the received data is exceeded.

Max URL Calls

Limits the maximum number of executed URLcalls, measured over the entire scan. The scan will be stopped if the maximum number of executed URL calls is exceeded.

URL Timeout

Ddefines the response timeout, in seconds, per single URL call. If this timeout expires, the URLcall will be reported as failed (no response from web server).

Max Path Depth

Limits the maximum URL path depth of scanned web pages.

Examplehttp://www.example.com/docs/content/about.htmlhas a path depth of 3.

Follow Redirections

Limits the total number of followed HTTP redirects during the scan.

Follow Path Repetitions

Limits the number of path repetitions which can occur within a single URL path. This parameter acts as protection against endless loops in scanning, and should usually be set to 1 (default) or 2.

Examplehttp://www.example.com/docs/content/about.htmlhas a path repetition value of 3.

Follow CGI Parameters

This (by default disabled) option acts as protection against receiving almost identical URLs many times if they differ only in their CGI parameters. If disabled, only the first similar URL will be processed.

Example: the first URLhttp://www.example.com/showDoc/context=12 will be processed, but subsequent similar URLs http://www.example.com/showDoc?context=10 and http://www.example.com/showDoc?context=13, will not be processed.

Authentication

Allows scanning protected web sites (or web pages).

Browser Language

Sets which default language should be preferred when scanning multilingual web sites.

Use Proxy

Apply the Personal Settings menu's Next Proxy Configuration when scanning through an (outgoing) proxy server.

SSL Version

Select the SSL protocol version to communicate with HTTPS servers (encrypted connections).

Annotation

Enter a short comment about the scan.

...

Analyze Scan

Convert Scan Result

A Page Scanner result can be converted into a “normal” web surfing session, which can be used to create creating a load test program.