Case Studies: U.S. Census Bureau

US Census 2010 LogoThe Client

The U.S. Census Bureau serves as the leading source of quality data about the nation's people and economy.

The Challenge

The U.S. Census Bureau's primary Web server consists of more than a million pages maintained by 60 disseminator groups of more than 200 disseminators. The Technology Applications Branch of the Systems Support Division needed to impose greater access control and accountability on a highly decentralized Web publishing model without losing any of the important features of the original system.

The original system allowed disseminators to publish directly to the U.S. Census Bureau's Internet Web server. Although this arrangement was fast and easy for the disseminators, it had several deficiencies:

  1. If a Web page had an error in it, such as a broken link, that would not be apparent until after the Web page was published, many users may have downloaded the bad page before it was taken down or fixed.
  2. If a file was mistakenly removed or modified, there was no backup copy of the most recent version of that document.
  3. Access to the Web server was via a non-dedicated host which included developers and publishers as well as other users.
  4. Changes to Web content were not logged very effectively.

The decision was made to implement a 3-tier solution:

  1. An internal development platform to include both developers and publishers
  2. An internal staging server with FTP-only access, restricted only to publishers
  3. The production Web server, with no FTP access and no user accounts

The challenge was to ensure that the Web content on the production server was kept up-to-date. Provisions had to be made for disseminators to have the ability to quickly and easily push their changes from the staging system to the production system. Speed in publishing was a particular concern for Web publishers in the Economic Branch that publish the time-sensitive economic indicators, such as Retail Sales and New Home Sales.

The Solution

As per the client's wishes, using freely-available open-source utilities such as Perl, rsync, and MySQL, a solution was developed, tested and implemented. Implementation was performed in an incremental manner as the groups of disseminators were trained by the Quotient on-site developer.

The solution consisted of constantly running server process on the staging system. The process continually scanned the Web directory hierarchy for either of two specially named service request files or token files. Once a disseminator places a token file in the directory that is to be synchronized, the following services are performed:

  1. the directory is synchronized
  2. changes are logged to a database
  3. the e-mail address of the requestor is determined and a session log is sent to that user

The system is flexible enough to run multiple requests in parallel.

The Results

With the new system:

  1. Errors in a Web page may be identified and rectified on the internal staging server.
  2. Mistakenly removed files from the staging server can be recovered from the production Web server.
  3. There is no direct way to publish to the production Web server. All changes are funneled through the new system.
  4. All changes to Web content are logged into a structured SQL-searchable database for easy reporting and analysis.

With the efficiencies inherent in the rsync protocol and a program design that supports parallel processing, synchronization requests are satisfied very quickly, usually within 10 to 15 seconds.

DoD ENCORE II Contract Vehicle USPTO SDI Contract Vehicle FBI ITOC Contract Vehicle NIH BPA Contract Vehicle GSA Contract Vehicle FAA eFAST Contract Vehicle