The Legal Innovation & Technology Lab's Spot API@ Suffolk Law School - Spot Version: 2022-05-21 (Build 10)

LIT Lab | About Performance Data Trust(s) Trustees

Is Spot Useful?

'All models are wrong, but some are useful.' - George Box

Maps are models, and they don't show everything. That's okay.

That depends on what you're using it for. Think of Spot like a map. It doesn't have resolution down to the inch, but it can still be helpful. Like a map, Spot is a model, and as the saying goes, "All models are wrong, but some are useful." Because it's "wrong," those using Spot should recognize that its output is not the final word, and if they want to know if it's useful, they'll have to ask "compared to what?" That being said, what you probably want to know is, "how wrong is it?"

Unfortunately, you can't just look at Spot's accuracy. To understand why, let's say you want to evaluate an algorithm that predicts if there will be a snow day tomorrow. I have a algorithm with an accuracy of 98%. Impressive right? What if I told you my algorithm was, "always guess no?" My model is 98% accurate because snow days only happen 2% of the time. To know if my algorithm is any good, you needed to know more. So you might ask what percentage of actual snow days did I "catch?" This is something called recall. The answer is 0%. You could also ask how often am I right when I say something is a snow day. This is something called precision. Since I didn't actually predict any snow days, I can't even calculate this number because I'd have to divide by zero. Either way, these alternative metrics make it clear that my model is no good. And none of that even takes in to account that I may have a preference for false positives over false negatives or vice versa. Consider a screening test for some ailment. The idea is that it's one step in a process, a filter used to identify folks for a diagnostic test (i.e., it's not the final word). In such a case you might care more about minimizing false negatives than you do about false positives. The point is, it's complicated.

You can see detailed performance metrics for each of our active labels by pinging the API's taxonomy method. It returns a json string with details about each label. See https://spot.suffolklitlab.org/v0/taxonomy/

Those performance metrics, however, are the ones you get if you take the model as predicting the presence of a label when it states there is more than a 50% chance of it being there. If you change the API's cutoff, you can favor recall over precision, or the other way around. So API users should think carefully about their use case and what cutoff is appropriate. See Notes on Uncertainty below.

Active Labels

The average weighted^† accuracy for our active labels is currently 94.0%. For recall, it's 78.0%, and precision is 76.0%. That "active labels" qualifier limits our model to only those labels for which we can make predictions better than a coin flip or always guessing yes/no, and it only includes the following 120 labels:

... show full list

; SNAP (Food Stamp) benefits; Eligibility and Applications for SNAP (food stamp) benefits; Health public benefits and medical assistance programs; Unemployment Benefits, Compensation, and Insurance; Applying for Unemployment Insurance benefits; Eligibility for Unemployment benefits; 'Ability to Work' criteria for Unemployment benefits; Unemployment Benefit amounts and time-lengths; Denial or Disqualification from unemployment benefits; Unemployment Benefits during a public health emergency; Utilities, Energy, and Heating benefits at home; Small Business and IP; Economic Development; Intellectual Property; Running a For-Profit Business; Courts and Lawyers; Going to court and dealing with procedure; Finding and hiring a lawyer; Crime and Prisons; Being a victim or witness to a crime; Having a lawyer in a criminal case; Understanding Specific Types of Crimes; Disaster Relief; Public Health Emergency issues; Education; Discrimination and rights around education; Estates, Wills, and Guardianships; Dealing with another person's estate after they die; Going to Probate Court; Guardianship and Conservatorship; Plans for if you are incapacitated or ill (Powers of Attorney and Advance Directives); Powers of Attorney; Family; Child Support; Divorce, Separation, and Annulment; Domestic Violence and Abuse; Getting a restraining or protection order; Parental Rights (and Termination); Family members with money, criminal, and other issues; Government Services; Health; Problems with medical or nursing home care; Medical Records and Privacy; Workers with health or disability issues; Housing; Discrimination around housing; Eviction from a home; Eviction from a homeless shelter or group home; Eviction from subsidized or public housing; Eviction from federally subsidized (Section 8) housing; Eviction from public housing (with government landlord); One-Strike Eviction policies in public and subsidized housing; Eviction from private rental housing (not public or subsidized); Defenses to stop or delay an eviction; Notice and Procedural defenses against an eviction; Living conditions (habitability) defenses against an eviction; Title and ownership defenses against an eviction; Retaliatory Eviction defenses against an eviction; Tenants' rights and options after an eviction; Housing Assistance, Public Housing, and Subsidized Housing; Emergency Assistance for housing costs; Local housing assistance programs; State housing assistance programs; Public Housing (with government as landlord); Federally Subsidized Housing (including Section 8/Housing Choice Vouchers); Affordable Housing development; Problems with living conditions; Poor living conditions in a rented home; Poor living conditions in a federally subsidized (Section 8) home; Challenging a federally subsidized rental's living conditions; Poor living conditions in a rental subsidized with rural rental assistance; Poor living conditions in public housing; Housing code compliance; Disruptions to 'quiet enjoyment' of a rental; Renting or leasing a home; Security Deposit for a rented home; Rent charges, increases, and control; Tenant Rights, Organizations, and Participation; Leases & Rental Agreements for a rental home; Utilities, Energy, and Heating at home; Foreclosure on a Home; Owning a Home; Zoning and Land Use of one’s property; Property Taxes; Homelessness and Housing Insecurity; Immigration; Options for immigration status, visas, permits, and travel papers; Immigrants' rights; Money, Debt, and Consumer Issues; Paying for medical care; Contracts, Warranties, and Deceptive Trade Practices; Insurance; Taxes; Debt and Lending Money; Debt Collection, Repossession, Garnishment; Repossession of a car; Bankruptcy; Chapter 7 Bankruptcy for people with limited income; Chapter 11 Bankruptcy for businesses; Workers' Rights; Accidents and Torts; Being Physically injured or harmed by another; Personal data privacy breach and protections; Getting Care and Compensation for an injury; Suing someone, or being sued; Harassment by another person; Traffic and Cars; Dealing with traffic and parking tickets; Driver's licenses or permits; Issues with Owning a Car; Vehicle/Auto License or Registration; Car insurance; Work and Employment Law; Problems at a Current Job; Problems regarding a Former Job. collapse list

Remember, you can see every LIST issue and their performance metrics by calling the API's taxonomy method. Only active labels will show performance data. All others will return TBD. See https://spot.suffolklitlab.org/v0/taxonomy/

LIST (formerly NSMIv2)

The following is a list of the parent and first-level child labels Spot is currently learning (click through for the first level of children). Children of children are currently not shown here but can be seen in the list above in Active Labels. Highlighted Labels are active labels (i.e., green labels have F1 >= 0.83, pea green labels have F1 >= 0.63, and yellow labels have F1 >= 0.50). Please note, this is not a complete list of all issues in the world. It represents only the current finalized issues found in LIST. Consequently, do not assume that subtopics are all inclusive. They are just the issues that have been finalized for inclusion in LIST at this point. LIST is in active development, and more issues are added regularly. Note: label descriptions are provided when available.

We're always working to grow our list of active labels and improve the metrics associated with each. If you'd like to help, you can play a few rounds of Learned Hands or share relevant data to help grow our training set.

Notes on Uncertainty

When Spot returns a list of issues, it provides a prediction accompanied by a lower and upper bound for that prediction. See Documentation. The bounds are intended to provide users with information about the uncertainty for a given prediction, allowing them to tailor their results to their use case and tolerance for error. The larger the distance between the lower bound and the prediction, the more likely it is that the prediction is overly optimistic about finding the issue. Conversely, the larger the difference between the prediction and the upper bound, the more likely it is that the prediction is under-estimating the chance that the issue is present. Users are encouraged to consider this when setting the cutoffs for triggering the return of an issue. For example, a user interested in avoiding false negatives may set Spot to return results based on the upper bound being greater than 50%, while one interested in avoiding false positives might choose to base their cutoff on the lower bound. The distance between the lower and upper bounds communicates something about the overall uncertainty around a prediction, with the most certain predictions being those with the smallest distances between the two.

Currently, lower bounds are calculated by multiplying a model's precision for a given label by it's prediction, and upper bounds are calculated by multiplying the prediction for the model's inverse (i.e., the model where positives and negative classifications are swapped) by it's precision and subtracting it from one.

Take Spot for a Test Drive

If you'd like to try Spot without making an account, feel free to take it for a test drive.

^† Values are weighted based on the number of affirmative examples for each issue in our dataset.