Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The window is divided into two parts.

Scan Result: The upper part of the window shows the scan's progress or the scan results when it has been completed.

Image Modified

Page Scanner Input Parameter: The lower part of the window allows scan input parameters and starting a scan.

Image Modified

...

Page Scanner Parameter Inputs

Starting Web Page

The scan starts from this URL. Optionally, scan only parts of a website by entering a deep-linked URL path; for example, http://www.example.com/sales/customers.html. In this case, only web pages below, or at, the same level of the URL path are scanned.

Char Encoding

The default value, Auto Detect, can be overridden in case some or all web pages are wrongly coded, such that the HTML header-specified character set does not match the character set which is actually used within the HTML body of the web pages (malformed HTML at server-side). You can try ISO-8859-1 or UTFas a workaround if Page Scanner cannot extract hyperlinks (succeeding web pages) from the starting web page.

Exclude Path Patterns

Excludes one or more URL path patterns from scanning. Commas separate the path patterns.

Follow Web Servers

Include content and web pages from other web servers within the scan; for example, when images embedded in the web pages are located on another web server. Enter several additional web servers, separated by commas. Example: http://www.example.com , https://imgsrv.example.com:444. The protocol (HTTP or HTTPS), the hostname (usually www), the domain, and the TCP/IP port are considered, but URL paths are NOT considered.

Verify External Links

Verify all external links to all other web servers. This is commonly used to detect broken hyperlinks to other web servers.

Include

Affects which sets of embedded content types should also be included in the scan. Page Scanner uses the URL paths' file extensions to determine the content type (if available) because this can be done before the hyperlink of the embedded content itself is processed. This saves execution time, but it might affect a few URLs for excluded content types flow into the result from scanning because the MIME type of the received HTTP response headers is only used in detecting web pages. Remove these unwanted URLs after the scan has been completed by using the "remove URL" form in the Display Result window.

Content-Type Sets

Corresponding File Extensions

Images, Flash, CSS, JS

.img.bmp.gif.pct.pict.png.jpg.jpeg.tif.tiff.tga.ico.swf.stream.css.stylesheet.js.javascript

PDF Documents

.pdf

Office Documents

.doc.ppt.pps.xls.mdb.wmf.rtf.wri.vsd.rtf.rtx

ASCII Text Files

.txt.text.log.asc.ascii.cvs

Music and Movies

.mp2.mp3.mpg.avi.wav.avi.mov.wm.rm.mpeg

Binary Files

.exe.msi.dll.bat.com.pif.dat.bin.vcd.sav

Include Options

Allows you to select or de-select specific file extensions using the keywords -add or -remove.

Example: 

-remove .gif -add .mp2

Max Scan Time

Limits the maximum scan time in minutes. The scan will be stopped if this time is exceeded.

Max Web Pages

Limits the maximum number of scanned web pages. The scan will be stopped if the maximum number of web pages is exceeded.

Max Received Bytes

Limits the maximum size of the received data (in megabytes), measured over the entire scan. The scan will be stopped if the maximum size of the received data is exceeded.

Max URL Calls

Limits the maximum number of executed URLcalls, measured over the entire scan. The scan will be stopped if the maximum number of executed URL calls is exceeded.

URL Timeout

Ddefines the response timeout, in seconds, per single URL call. If this timeout expires, the URLcall will be reported as failed (no response from web server).

Max Path Depth

Limits the maximum URL path depth of scanned web pages.

Examplehttp://www.example.com/docs/content/about.htmlhas a path depth of 3.

Follow Redirections

Limits the total number of followed HTTP redirects during the scan.

Follow Path Repetitions

Limits the number of path repetitions which can occur within a single URL path. This parameter acts as protection against endless loops in scanning, and should usually be set to 1 (default) or 2.

Examplehttp://www.example.com/docs/content/about.htmlhas a path repetition value of 3.

Follow CGI Parameters

This (by default disabled) option acts as protection against receiving almost identical URLs many times if they differ only in their CGI parameters. If disabled, only the first similar URL will be processed.

Example: the first URLhttp://www.example.com/showDoc/context=12 will be processed, but subsequent similar URLs http://www.example.com/showDoc?context=10 and http://www.example.com/showDoc?context=13, will not be processed.

Authentication

Allows scanning protected web sites (or web pages).

Browser Language

Sets which default language should be preferred when scanning multilingual web sites.

Use Proxy

Apply the Personal Settings menu's Next Proxy Configuration when scanning through an (outgoing) proxy server.

SSL Version

Select the SSL protocol version to communicate with HTTPS servers (encrypted connections).

Annotation

Enter a short comment about the scan.

Analyze Scan

Convert Scan Result

.

Authentication

Allows scanning protected web sites (or web pages).

Supported Authentication Methods

Authentication Method

Description

 Basic

Apply HTTP Basic Authentication (Base64 encoded username:password send within all HTTP request headers). You should also enter a username and password into the corresponding input fields.

 NTLM

 Apply NTLM authentication for all URL calls (if requested by the Web server). The NTLM configuration of the Personal Settingsmenu will be used.

 PKCS#12 Client Certificate

 Apply a HTTPS/SSL client certificate for authentication. The active PKCS# 12 client certificate of the Personal Settings menu will be used.

Scan Options

Image Added

ABORT: You can abort a running scan by clicking on the “Abort Scan” “X“Icon

Image Added
Image Added

DISPLAY: Display the scan result

Image Added
Image Added

CONVERT: Converts the Page Scanner Result into a “normal” Web Surfing Session .prxdat creating a load test program for additional ZebraTester actions.

Image Added
  • A filename, without path or file extension, is required.

  • An annotation is recommended to provide a hint in Project Navigator

  • Click Convert and Save when ready.

  • Optionally display the newly converted session in the Main Menu.

Filename

The filename of the web surfing session. You must enter a "simple" filename, with no path and no file extension. The file extension is always .prxdat. The file will be saved in the selected Project Navigator directory.

Web Pages

Selects the scanned web pages which should flow into the web surfing session. “All Pages” means that all scanned web pages are selected. Alternatively, the option “Page Ranges” allows you to select one or several ranges of page numbers. If you use several ranges, they must be separated by commas.

Example: "1, 3-5, 7, 38-81"

Max. URL Calls:

Limits the number of URL calls that should flow into the web surfing session. 
Tip: Apica recommends not converting more than 1,000 URL calls into a web surfing session.

Annotation

Enter a short comment about the web surfing session. This will become a hint in Project Navigator.

Load Session into

Optionally loads the web surfing session into the transient memory area of the Main Menu, or into one of two memory Scratch Areas of the Session Cutter.

Image Added

SAVE: When a scan has completed, save the scan result to a file. The file will be saved in the selected Project Navigator directory and will always have the file extension .prxscn. Scan results can be restored and loaded back into the Page Scanner by clicking on the corresponding "Load Page Scan" icon inside Project Navigator

Image Added
Image AddedImage Added
Image Added

DISCARD

Discards the Scan Results

...

Analyzing the Scan Result

...

The most important statistical data about the scan are shown in the summary/overview, near the top of the window. Below the overview, select the various scan result details.

The search form, on the right side near the scan result detail selection, allows you to search for an ASCII text fragment overall web pages of the scan result. By default, the text fragment is searched for within all HTTP request headers, all HTTP response headers, and all HTTP response content data.

The remove URL form, which is shown below the scan result detail selection, allows you to remove specific sets of URLs from the scan result. The set of removed URLs is selected by the received MIME-type (examples: IMAGE/GIF, APPLICATION/PDF, ..), and linked with a logical AND condition with the received HTTP status code for the URLs (200, 302, ..), or with a Page Scanner error code, such as "network connection failed".

with content MIME type

selects a specific MIME type). The input field is case insensitive (upper and lower case characters will be processed as identical). any means that all MIME types are selected, independent of their value. none means that only URL calls whose HTTP response headers do NOT contain MIME type information (HTTP response header field "Content-Type" not set) will be selected.

HTTP status code

selects an HTTP status code or a Page Scanner error code.

Note: A few URLs with excluded content types may flow into the scan result (not selected by scan input parameter). You can use the "remove URL" form to clean up the scan result, and to remove any unwanted URLs. The most common case is to remove PDF documents from the scan result.

Scan Result Details

...

The Scan Input Parameter displays all input parameters for the scan (without authentication data).

...

 

Scan Statistic displays some additional statistical data about the scan. Similar Web Pages are the number of web pages with duplicate content (same content but different URL path). Failed URL Calls are the number of URL calls which failed, such that no HTTP status code was available (no response received from a web server), or that the received HTTP status was an error code (400..599).

 

...

Non-Processed Web Servers displays a summary of all web servers which have been found in hyperlinks, but whose web pages or page elements have not been scanned. The number before the server name shows the number of times the hyperlink was ignored by Page Scanner.

 

...

Scan Result per Web Page: displays all scanned web pages. The embedded content of a web page, such as images, is always displayed in a Web Browser Cached View. For example, this can mean that a particular (unique) image is only shown once inside the web page in which it has been referenced for the first time. All subsequent web pages will not show the same embedded content. This behavior is more or less equal to what a web browser does - it caches duplicate references over all web pages within a web surfing session.

More details about a specific URL call can be shown by clicking on the corresponding URL hyperlink.

...


Broken Links displays a list of all broken hyperlinks.

...

 

Duplicated Content displays a list of URLs with duplicate content (same content but different URL path).

 

 

 

 

Largest Web Pages displays a list of the largest web pages.

...

Slowest Web Pages display a list of the slowest web pages.

...

Tip: you can click the bars to display the corresponding page details.

(lightbulb)

Converting a Scan Result into a Web Surfing Session

A Page Scanner result can be converted into a “normal” web surfing session, creating a which can be used to create a load test program.

...

Input Fields

Filename

The filename of the web surfing session. You must enter a "simple" filename, with no path and no file extension. The file extension is always .prxdat. The file will be saved in the selected Project Navigator directory.

Web Pages

allows you to select the scanned web pages which should flow into the web surfing session. “All Pages” means that all scanned web pages are selected. Alternatively, the option “Page Ranges” allows you to select one or several ranges of page numbers. If you use several ranges, they must be separated by commas.

Example: "1, 3-5, 7, 38-81"

Max. URL Calls

limits the number of URL calls that should flow into the web surfing session.

 

Note

Note

Tip: Apica recommends not converting more than 1,000 URL calls into a web surfing session.

Annotation

we recommend that you enter a short comment about the web surfing session.

Load Session into

also loads the web surfing session into the transient memory area of the main menu, or into a scratch area of the Session Cutter.

 

...

After the web surfing session has been stored, it will be automatically loaded into the Main Menu if the “Load Session into” checkbox was selected. After this, you can generate the load test program.

...