Web Automation
Web Services
Web services are become a standard for an application integration. They are
part of the Service Oriented Architecture (SOA). Unfortunately, a very small
percent of existing Web applications provides Web service interface (API).
It makes an application integration impossible.
The following approaches can be used to access the front end of the Web applications
to create the lacking API: "Raw" HTTP, "Raw" IE Automation, and SWExplorerAutomation.
SWEA converts a Web application into programmable objects: scenes
(pages) and controls. Those objects are visually defined using visual designer,
and accessible from any .NET language.
Pros
- Can work with any Web page shown in IE.
- Doesn't require knowledge of TCP/IP, HTTP, HTTPS, cookies, etc.
- Separates data extraction from program logic.
- Effectively handles error conditions.
- Takes minutes to write code.
- Can run from Service and ASP.NET.
Cons
SWEA allows Web applications to be "service enabled". The
functionality contained within an existing Web application can be exposed as
Web services. SWEA does not require any modification of the existing
Web application. It provides a rapid implementation cycle with minimal risk.
Interactive Tutorials and Demonstrations
The SWEA automated solutions can be used to develop interactive (live) tutorials
for the existing Web applications.
Data Extraction
The Web is the largest source of information ever created. With SWEA
information such as address data, price lists, images, news, publications,
etc… can easily be extracted and integrated into your in-house information system.
|
 |
"Raw" IE automation |
|
|
The solution is based on accessing HTML DOM. It uses Internet Explorer automation or hosts Web
Browser control to get access to the HTML DOM data model.
Pros
Can work with any web page shown in IE.
Doesn't require knowledge of TCP/IP, HTTP, HTTPS, cookies, etc.
Cons
Changes to web site layout will break an extraction.
Requires a good knowledge of Web Browser events, HTML DOM, COM.
Not as fast as HTTP way.
Time consuming.
|
 |
"Raw" HTTP |
|
|
HTTP is a "raw" approach. It uses WebRequest (.NET) to download a page
source locally. The data then can be extracted by XPath or regular expressions.
To use XPath, the page source should be converted to XML (XHTML) using HTML Tidy or other conversion tool.
Pros
Performance is very fast.
Cons
Requires knowledge of TCP/IP, HTTP, HTTPS, cookies, etc.Due to HTML
is not well formed, HTML to XML conversion will not always work.
Very unstable. Even simple changes to a web page layout will break an extraction.
Will not work with web pages created by JavaScript.
Time consuming. |
|
|