Our Story

This site grew out of a Twin Cities Open Data project proposal. The goal is to help tenants and city government recognize when a landlord owns many parcels. In some cases a landlord owns hundreds of properties using shell corporations. This information provides better visibility into the landlord's behavior and can be used to pressure the landlord.

Data and Technology

Currently, this site is a mashup of four datasets:

A parcel is the core unit of our analysis because it is the unit of ownership. Multiple parcels are linked together into a portfolio when there is evidence that the parcel is owned by a single underlying entity. The Metro Parcel data set has a variety of fields that are used for linkage, such as owner_name and owner_address. The rental license data set has some additional fields, such as email and phone. All of these fields require data scrubbing because there is variability. A phone number might be entered as (612)555-1234 on one record and as 612.555.1234 on another record. Names, both personal and business, may have variability in punctuation, abbreviations and middle initials. Addresses have variability in their format and use of abbreviations. During the data scrubbing, we try to eliminate irrelavant variability and keep the important variability. Once the parcels have been "tagged" with the scrubbed attributes, we link parcels that shared a common tag. For example, parcels A and B might be linked because that are both associated with an owner or taxpayer "Jo Richards" and parcels B and C might be linked because both have an email address of j.richards@gmail.com listed. In this case, parcels A, B and C would be placed into a single portfolio.

The Secretary of State's business data is used to generate a secondary set of tags for a parcel. If a parcel owner's name can be matched to a corporation, the names and addresses associated with the corporation are tagged to the parcel. This allows us to link parcels that are owned by multiple shell corporations that list a single corporate owner. We recognize that this process of cleanning-tagging-linking is an imperfect and so our system provides some tools to address both over-linkage and under-linkage.

The regulatory service data from Minneapolis does not provide additional tags for a parcel, but it does enrich our understanding of the parcel, and indirectly, the owner. Because we do not have historical ownership information, we cannot tell what owner to blame for violations prior to the most recent purchase.

After the raw data were downloaded as csv and shape files, they are brought into Python using the Pandas package, pre-process and stored in Azure ADLS. The user-facing web application is written in Django/Python and hosted on Azure. The source code is available on Github.

Caveats