Better Know a Former Congressmember with Grep

The overview to a four-part homework assignment in looking up and comparing lobbying and U.S. Congressional activity.

This page serves as the overview for the 4-part homework project on using computational techniques to automate the collection, parsing, and filtering of data related to lobbying activity and the United States Congress.

(This overview and the assignments are still under construction. Expect a due date of early March)

The title of this project is inspired by The Colbert Report's 435-part series, Better Know a District.

Find something interesting, write a program to count it

This mini-project is meant to be both a review of programming concepts and an example of how simple (but brute-force-powered) data-filtering techniques can (and should) be applied to interesting information problems, which is basically the theme of this entire course.

The abstract goal of this project: match a list of names to another list (e.g. lobbying activity) that contains names and see if the matches contain anything interesting.

But what is "interesting"? That word means entirely different things, depending on whether you are a government official, journalist, public advocate, academic, or hedge-fund analyst.

Off the top of my head, for the scope of this assignment, a shortlist of possibly interesting things:

So what is interesting is not a computational problem. But gathering the information and filtering it is most definitely a problem for a computer to solve for us. Thanks to the hard work and advocacy of groups such as the Sunlight Foundation and the Center for Responsive Politics, as well as the many policymakers and journalists who effected change in response to scandal and controversy, we have datasets that can be relatively easy for the computer to compile, leaving us to deal with the interesting work of finding something interesting in the data.

While the programming needed to make a computer collect and filter the data is relatively trivial (you might be able to do all of it under 50 lines, in Bash), the collection and filtering of data is not trivial. As you'll soon see, one: there's a lot of data, and two: the origin and purpose of each dataset present various challenges when trying to join them for cross-referencing and analysis.

In other words, even though this data is public and easily accessible, it is not easily usable. In this project, we'll see how much we can make it usable.

The data work

While this is thematically just one project, I've broken up the data-collection parts into their own assignments, as they do deal with different data domains and challenges. And also to keep you from trying to cram this entire assignment in the night before (TBA, but probably early March).

Each part of this project will have its own page (TBA). For now, the tasks are divided into:

1. Collecting post-employment and historical legislator data
2. Collecting Congressional staff data from expenditure reports
3. Collecting lobbyist data from the public lobbying database

Be sure to check out the web interface for the Senate lobbying database, as well as the information and guidance on the Lobbying Disclosure Act

4. Joining the data and finding names matches

The end result of steps 1 through 3 is to create data files that can be used in Step 4. The data files will basically be an arrangement of fields common to all the datasets, and fields particular to each dataset but useful to keep track of:

last_name first_name date description another_description something_else
           

For example, from the Senate expenditure reports, the parsed data fields might look like this:

DOE JOHN N 2012 SECRETARY OF THE SENATE - LEGISLATIVE SERVICES Funding Year 2012 SALARIES, OFFICERS AND EMPLOYEES, SENATE REPORTER OF DEBATES 50,100.25
           

Virtually every technique required to collect and filter the data, we've used in past assignments, including:

In particular, this project most resembles the challenges of the death rows assignment, in which you're gathering related data from different sources (and formats) and reconciling them.

Getting better at knowing

One of my favorite segments from The Colbert Report is Better Know a District, in which he does his part to introduce America to our many representatives, letting us know their accomplishments and other important issues, such as their grasp of the Ten Commandments and whether cocaine is fun.

Despite the inherent risk to legislators, the "434-part" series managed to air more than 80 segments. But it just underscores how little we know of each of our sitting legislators. And also, how many of them are there.

And we even know less of past Congressmembers. From 2005 to 2014, when "Better Know a District" first aired, about 350 House members – and 80 Senators – have left Congress. Some of them are retired and others found work to do. But there's no LinkedIn for former members of Congress. Which isn't surprising, since they aren't being paid by taxpayers to act on our behalf, and so there is less interest in how former Congressmembers choose to spend their time.

However, what if they spend their time on things that impact the American public in interesting ways? And what if that work is based off of, or helped by, the work they did while under the taxpayers' employ? Well, then this becomes an interesting data problem.

More notes to come…

Further reading

OpenSecrets Revolving Door Project by the Center for Responsive Politics - some of the results of our homework project will resemble, on a smaller scale, the OpenSecrets project that tracks where Congressmembers and other public servants end up on K Street. It is an excellently-researched and presented project, so look at it as an example of what you can do with the data and the analysis.

Take the Money and Run for Office, This American Life, March 30, 2012

How revolving door lobbyists are taking over K Street, by Lee Drutman and Alexander Furnas, The Sunlight Foundation, Jan. 22, 2014

The Trouble With That Revolving Door, by Thomas B. Edsall for the New York Times, Dec. 18, 2011.

A Revolving Door Where Lobbying Rules Don't Apply, by Dan Morgan for The Washington Post, July 21, 1997

Members of Congress trade in companies while making laws that affect those same firms - By Dan Keating, David S. Fallis, Kimberly Kindy and Scott Higham. This is an article that deals with a non-lobbying angle, but the premise is the same: look at two different Congress-related datasets and find something interesting.

Erring on the side of shady: How calling out “lobbyists” drove them underground, by Tim LaPira for the Sunlight Foundation

All cooled off: As Congress convenes, former colleagues will soon be calling from K Street, by Sunlight Foundation and Center for Responsive Politics

Study shows revolving door of employment between Congress, lobbying firms, by T.W. Farnam, Washington Post, Sept. 13, 2011

Lobbyists call bluff on 'Daschle exemption', by Chris Frates, Politico

Registered lobbyists are mostly compliant - but what about the unregistered ones?, by Sunlight Foundation

Law Doesn’t End Revolving Door on Capitol Hill, by Eric Lipton and Ben Protess

2013 GAO Report: Observations on Lobbyists' Compliance with Disclosure Requirements, by U.S. Government Accountability Office

Project tree

This is probably what the project folder structure will look like:

compciv
|___homework
    |__congress-lobbying/
       |___expenditures/
           |___helper.sh
           |___parser.sh
           |___parsed_house_expenditures.psv
           |___parsed_senate_expenditures.psv
           |___data-hold
              |____senate
              |____house
              

       |___post_employment/
           |___helper.sh
           |___parser.sh
           |___parsed_historical_congress_legislators.psv
           |___parsed_post_senate_employment.psv
           |___parsed_post_house_employment.psv
           |___data-hold
              |____senate
              |____house
              

       |___public_filings/
           |___helper.sh
           |___parser.sh           
           |___parsed_lobbying_filings.psv
           |___data-hold/

More notes to come…