Off-the shelf tools and open-source for ATOM feeds

Currently the agency reporting is provided using Excel/CSV/PDF. Existing web data scraping/harvesting technology can be utilized to aggregate this data and provide it as REST/ATOM services. Kapow Technologies (kapowtech.com) Mashup Server products can provide the aggregation and programmatic access for structured and unstructured data. The existing agency data sources can be collected automatically (using "robots as Kapow defines them) and be transformed for aggregation creating new abstractions.

The Data Collection product can be used to collect and transform data available on standard agency.gov/Recovery URLs or from MAX. This places minimum burden along the tiers of recipients along the value chain.

Hybrid collection scenarios can also be possible which combine different types of structured and unstructured data sources. You can harvest page content (agency reports) along with data files. The content then can be rewritten to a MySQL database to be repurposed; or to excel files.

The Harvested data can be transformed (ETL) and written to a database, or published as Feeds and Services (REST, WADL, RSS, ATOM) using the Web 2.0 edition product. These feeds and services can also be mashed up using mashup platforms (Serena, BEA Ensembles/pages, IBM Mashup Center/QEDWiki). They can also be utilized by custom applications with toolkits for AJAX/PHP/Ruby, Eclipse, Netbeans, Visual Studio.

The Portal Content edition product can be used to clip Recovery act content from agency sites (each agency’s page is a portlet) into a single Web presence (Recovery.gov/agencypages).

Additional data transformation capabilities can be delivered by combining with open-source data integration products such as Mural, Jitterbit, Talend, Apataros, Kettle, CloverETL, etc.

There are also commercial tools such as Informatica, CastIron, etc..

IBM’s Mashup Center can also be explored for similar functionality. I expect Kapow to be less costly and more functional.

Why is it important?

Recovery.gov has the potential to become a catalyst for adoption of the next generation of the Web. However, there are expectations of reporting and transparency as quickly as the money is being distributed.

We can deliver data feeds and web services now while working towards a Semantic Web/Linked Open Data implementation and web APIs.

Using enterprise-tested tools with low-complexity implementations such as Kapow can open the gates for public consumption. Ultimately it would also support the long-term collection and aggregation of data from providers of a wide-range of capabilities as part of the Data Portal+Edge reporting system.

Leave a Comment