Forward History This page contains the project plan for a software prior art initiative under consideration. This page is "in progress." We welcome any constructive input. If you see sections that need help, of which there are many, or you have suggested improvements, please make the changes.

Meetings

June 25th 2008

tag list, tag definitions
Which things should be mandatory?
- We can come up with the schema and figure out what should be required
Need to email someone or request review?
- Other people would need to email others or invite to a document.
Figure out a way to handle disambiguity (like wikipedia!)
- Ask wikipedia guys how they do that. :)
- Encourage participation via a null-set splash page, etc.
Define what canned reports we'll want to pull from the database.
- Category populations & totals.
Teaser for people to help complete incomplete documents.
- Know if something is incomplete.
User karma.
- Participation points for posts edited.
Article watching.
- Tag an article and add to your watch list.
Abuse management.
- User registration throttling and verification.
- Locking articles if necessary (or disabling comment threads).
- Revision history.
- Ability to report an offensive comment?
  - Digg-like voting? Comment usefulness?
- Ability to report a user.
Threaded comment system for flame wars.
- Becomes a part of permanent record.

Contents

Summary

By creating a user generated content (UGC) tool, accused patent infringers and interested parties can assess and evaluate the validity of software patents with an open catalog system. Users will be able to search and compare descriptions (tags) of non-patented prior art references, including both commercial and non-commercial software implementations, against defined elements of a patent. The tools are designed for any interested party or defendant to use to find invalidating prior art software. The UGC is posted by any interested party, and sufficiently tagged via the poster such that the internal processes, modules, and structures of the software reference can be easily matched against patent claim elements. The catalog would be populated both incrementally by interested parties who wish to record their software developments (commercial and non-commercial) and by data dumps from prior patent litigation cases. Each domain area and associated taxonomy can grow incrementally with the assistance of community editorial groups.

Goals

Create a free, open, searchable database of non-patented software prior art to help give proper credit and prevent invalid patents from being asserted.

Provide software developers a means to document, define, and record their inventions without filing a patent.

Prevent others from subsequently filing patents on innovations developed first by someone else.

note: The above is true mostly for the US where Prior Art takes precedence over filing date. In many countries it's the date of filing that matters.

Problem Statement

The history of innovations in software is not well recorded, consequently, when presented with a patent whose validity is in question, it is difficult and expensive to identify and find non-patent prior art which may invalidate the patent in question.

The reason it is difficult is because the process of identifying and finding invalidating prior art is contingent upon effectively comparing the elements, modules, processes in a software reference against the specific claims of a patent. Because the source code of the vast majority of software references is not initially available except by subpoena, there is no effective way to determine whether a software reference discloses the elements of a patent claim. As stated by the Software Patent Institute "...What is needed is not the detailed code but some level of description of what is in the code. ... To effectively find prior art, algorithms, control flow, data structures, and underlying processes must be transparent in commercial and non-commercial code, which today it is not."

The result, in conjunction with other systemic factors in both the patent examination process and the litigation process is that:

Too many software patents are issued that are, in fact, not novel or non-obvious. This diminishes the integrity of the patent process and allows the unjust extraction of value from a commercial ecosystem.

The assertion and enforcement of such patents imposes significant and material legal defense costs on defendants, particularly for emerging companies and start-ups, which causes the diversion of resources from innovation and development to non-productive litigation defense.

Benefits

Ensure that only valid patents are asserted and enforceable.

Ensure that the actual inventors are credited with their innovations. An invalid software patent grants control over inventions to people who didn’t create them.

Make the prior art value of real software more accessible, because most coded software innovation has not patented or otherwise adequately documented to date.

Reduce cost (time and money) of identifying prior art.

Project Scope

What's Included

The database should include the following kinds of software and corresponding meta-data:

Commercial Software Products
Non-commercial Software Products
Published papers and research projects
Open source projects
Technical Disclosures

The database can also contain either: 1) links to a trusted repository where the reference is stored; or 2) a copy of the reference itself.

What's Not Included

The database is not intended to address patented publications, although it could (e.g., relevant patent references can be included as meta-tags to software references included in the database). (There are already good search tools and efforts to make patent searches better.)

Intended Audience

Software Developers Seeking to Understand or Obtain the Prior Art
Members of Open Source Communities Seeking to Share Innovations
Individuals or Entities Subject to a Patent Suit or Adversarial License Negotiations
Extendable to the USPTO for Researching Prior Art in connection with the Examination of Patent Applications claiming software or software-related inventions
Extendable to All Persons Researching Prior Art in Software or Other Technology fields

Milestones

Schematics and Partners. The initial phase is currently underway: to form partners, design the parameters, and create schematics of the system. An informative public website will accompany this phase.

Alpha Version. We intend to build and implement the most basic features into the system this year. Let's call this first rough implementation an "Alpha" version.

Testing. Once that's complete, we want to test the Use Cases and see if it works as intended, and identify critical enhancements and/or obstacles. This testing needs to include potential users (developers, defendants, attorneys).

Evaluation. Based on the feedback, we need to determine if the project is viable, and if so, what enhancements are needed to overcome key obstacles. If it doesn't work or we can't make the system easy enough so it can succeed, we need to try something else. If it is viable, we will make critical changes, and scale up testing again to a broader and larger audience.

Beta Version. Wash and Repeat.

Use Cases

Diagrams

Simple diagram

Searching for a Reference

A user knows of a software patent and wishes to find prior art related to the patent.
Prior art in this case being one or more references (document, executable code or description of software/system) which contain(s) all of the elements of at least one claim of the patent.
The user knows the patent claim elements.
The user accesses the database to perform a search using the claim elements.
A search screen is presented that allows the user to search on multiple fields which correspond to the patent claim elements as well as other meta-data such as date, author, field of use.
The claim elements for the search are presented as predetermined tags from a drop down list.
The user can add additional tags which, after review by a moderator, are added to the tagging schema.
Search results are returned which reflect matches to the search criteria.

Reference Input

Incremental Prior Art includes those printed publications which are self-contained, such as a .pdf of an article, a segment of code, or any other document that has its material contained within the document's four corners online or offline.

A User Wants to Record A Software Development

User creates a software development:

Note whether other references exist with this exact name;
Provide for search to check for tags or spellings;
Suggest other categories of references and tags that are similar subject matter;
Suggest other references under the category or tag searched initially;
Suggest Search of Category or Tag searched initially in existing references;

Opening Web Page has Blanks to prompt submissions in the following categories:

Description that describes in plain language (<50 words) the subject matter of the development;
Relevant excerpt of novel provisions (e.g., sentence around tagged area);
Copy of document or code itself;
Executable file or permalink to same;
Other related documents and references.

Additional Requirements:

Place for these documents to sit.
Create a functionality where we automatically grab files from the url and bring it to our repository.

User Wants to Tag a Document So Later Users Can Find It

A user knows of prior art related to a software patent or believes that the reference is novel.
Prior art in this case being one or more references (document, executable code or description of software/system) which contain(s) all of the elements of at least one claim of the patent.
An input screen is presented that allows the user to enter and tag on multiple fields which correspond to the patent claim elements as well as other meta-data such as date, author, field of use. Suggested tags are presented on a drop-down menu.
Each of these multiple fields that corresponds with a field on another piece of prior art will link through to a web-page that allows users to add content and English-language descriptions in a wiki-style format.
Each of these web pages allows the users to create links between them, such that where a synonym for a term is used, links to the related terms appear at the top of each page.

User Wants to Record A Collection of Prior Art References

Prior Art Collected in Litigation. Defendants in patent litigation collect substantial volumes of prior art during the course of the case. These are in the form of documents, actual hardware or software items, and claim charts which describe how the references disclose elements of the patents in suit. These references are ideal sources of data to populate the prior art repository. In some cases, the prior art has already been coded electronically in a proprietary database, thus, the prior art repository should support some form of data import.

PACER Database. Case Files, such as those maintained on PACER in active federal court patent litigation contain a wealth of information. For example, answers to interrogatories often include a laundry list of potential prior art. To collect and maintain the most amount of information with limited input, it is necessary to tag and catalog these documents.

Components

Online Database (Freebase) that functions as a repository for UGC, including meta-data and actual code or link to repository. (optional).

Search interface that supports multi-element search against meta-data. Should support pattern matching, ie. patent claim v. meta-data tags.

Input interface to support input of references (single input and bulk input).

Taxonomy which can be used to index and tag references. Must be easily extensible.

Hosting. Hosting facility for online database.

User Registration module for log-in, user preference management, permissions managaement.

Tags & Meta-data

What is a Tag?

Tags are keywords, but they link through to wiki-style information pages. Some of the tags (see bottom of this section) are predetermined. These are Tags that we've chosen to provide some consistency across pages and categories. But others may be chosen by the user. So, tagging can be a lot easier and more flexible than fitting your information into preconceived categories or folders.

For example, if you save an article about how to write a certain kind of program from a workstation, you can tag it with terminology describing the program and also with workstation, server, microcomputer, Unix, RISC, or whatever other tags you might use to find it again. You don't have to rely on the designer of a system to provide you with a category for computer programs or workstations. You make up tags as you need them, and use the tags that make the most sense to you.

This is great for organizing and finding prior art and new publications, but it goes even further when someone else posts related content using the same tags. You begin building a collaborative repository of related information, driven by personal interests and creative organization. Now, as the same tags are linked to, you can find all of the prior art sharing common tags and add to the information pages about them.

How do I tag?

When saving a new publication authored by you, create a field for tags. In this field, enter as many tags as you would like, each separated by a space. Recommended tags are a combination of tags you have already used and tags that other people have used. You are under no obligation to use these! They are only there to help you. What tags or words would help you remember this page a few years from now? That's a good place to start.

The only limitation on tags is that they must not include spaces. So if your program is written from a Unix workstation, it should get two separate tags if it could occur at any workstation, and this one happens to be on Unix, or one tag that is "Unixworkstation" if that is the only way place the program works. You probably don't want to use commas, though, since a comma will be become part of the tag.

Finally, any new reference that is being newly created or that has been found by a user that doesn't have time to make a more indepth analysis can be tagged initially with: "PatentProof." This will allow it to be picked up by search engines and tagged by others.

Functional Requirements

Operation

Hosted database accessible via the web
User/name password login required to edit/modify existing entries and for admin
No registration required for search or data input

Appearance & Performance

Target 98% uptime
Return search results fast
Target 20 simultaneous users
Estimated number of references - 500k to 1MM.

Web Content

We need the capacity to include at least the following:

Search results/output
FAQ
Instructions for how to use the system (text, video, exemplars)

We need to figure out:

How many items of information appear on a single output page (e.g. tabs vs. hyperlinks)
What kind of interface would convey the information more succinctly and intuitively
How the user would interact with the output to provide feedback on it, edit it, add to it, etc.

Data Input

[P1] Input meta-data which describe a reference (software application, i.e. date, name, etc.)
[P1] Input Tags which describe reference functionality
[P1] Reference functionality includes features of the software that describe the processes, modules, functions of the software.
[P1] Upload electronic copies of reference documents (PDF, DOC, XlS, PPT, Google Docs, TIF, JPEG)
[P1] Reference may consist solely of a written description in narrative form
[P1] Reference may have multiple tags
[P1] User can select from predetermined list of tags
[P1] Import and use existing tagging schemes (e.g., OSAPA, Glossary of Computerized System and Software Development Terminology)
[P2] Determine if meta-data already exists, if not new, present similar records, if new, allow new input.
[P2] Suggest tags related to meta-data which describes a reference.
[P2] User can add new tags and the new tags become part of the pre-select list
[P2] System has semantic pattern function to present related words when a user inputs a tag. i.e. user inputs data storage, and system presents user with option to select related words which are displayed. storage - RAM, memory, etc.
[P2] Admin capability to modify/delete/add tags to the list
[P3] Determine level of input that is mandatory
[P3] Automatically OCR electronic references
[P3] Consider Randomization Techniques

Meta-Data

There are two types: i) fixed atttributes which describe the reference and the other administrative data related to the objects; and ii) extensible attributes, more subjective, which describe the functions, purpose, relatedness of a reference.

Fixed/Objective Characteristics

Below is an initial list of the kinds of Tags we've chosen that should be used to describe the references wherever the information is known.

Reference Name (Fixed)
Software Type Category(s)
Sub-Type Category(s)
Purpose of the Software
Owner (Fixed)
Author of the Software (Fixed)
Earliest Date of Use or Public Disclosure (Fixed)
Other Resources ( Experts Knowledgeable About the Software )
- Names
- email contact
Link pointing to an archive copy or trusted repository (suggest SourceForge)
Related Patent Nos. (Fixed) (may include one or more)
[P4] Patent Number Returns Scraped Fields (e.g., Filing Dates, Priority Dates, Critical Dates)
[P2] Time-stamp function to track and report date the record was created and published.

Extensible/Subject Characteristics

Software Type Category(s)
Sub-Type Category(s)
Purpose of the Software
Description of the software
Tags describing the functional elements of the software
Tags to list the kinds of other software, systems, hardware the reference is/could be used with
Related Patent Nos. (opportunity to explain how related)
Cross-reference to other references that may be within the database or outside
Auto-suggestion feature to suggest cross-references
Semantic pattern matching function as described above

Searching

Default Search

Text box for entry of keywords or patent numbers (optional text entry)
Should support option to search against tags/metadata only and search of text of claims and equivalent resources only in basic search option.
Should support option for semantic search, ie. RAM would return ROM in advanced search option.
Should support search across any combination of the above simultaneously in an advanced search option (e.g., with check boxes for each option)
Should support boolean operators and common syntax such as +, -, "", for excluding and requiring terms or phrases
Must include the option to constrain the search based on required meta data fields using pre-populated pull down lists

Default Results

20 results per page, sorted by relevancy, if keywords are included, or by date, if metadata alone is searched
Each result listed should include required meta data and a brief text snippet describing the resource, when available
Should be able to filter results based on one or more of the required meta data fields

Results for searches on Patent numbers

Should be able to support an alternate search results UI template for searches on patent numbers

Advanced Preferences/Nice to Haves:

Users should be able to select number of results to return per page

Reporting

Categories should be user-definable (add/edit/delete functions).
Field to associate with any event (meeting, Address, Call, etc.) to a category. There could be some default categories (e.g., Family, Friends, Work, Clients, Subcontractor, Restaurants, Computer) or perhaps example category sets for different purposes (e.g. business, software development, education, single person, family)
Events can be filtered based on categories.
Events can be associated with multiple categories.
Should be able to be colored based on differing categories

Exporting

Ability to export data to common data formats (xls, csv)
Ability to export only selected events.
Export to HTML (pretty much the same as Print).

Printing

Print search results or descriptions of references

Administrative Console

Manage taxonomies
Create reports
Run reports
Allocate permissions to others to manage domain areas in a given taxonomy

Outreach and Activism

Working to get the word out to the technological community and to the legal community is an important aspect to creating a large-scale database that is usable to the public. Suggestions for how to do so are as follows:

Make a effort to consider technologists motivations and incentives to tag their work for purposes of being more proactive rather than reactive. We seek to inform innovators of the benefits of sharing their work with others and at the same time of the benefit of officially disclosing art in a manner that may prevent future patentees from seeking coverage of the same. Building a shared knowledge base will equip the public to better handle potential patent suits.
Create Contact List(s) for Partners, Potential Experts, and Existing Prior Art Databases
Develop Talking Points Memo
Project Name and URL for Holding Database
Gather Taxonomy

How Can You Get Involved

We need to come up with a good name for this project, one that is unique and yet describes what it is.

Weekly Call

There is a weekly project team call on Wednesdays at 11am PDT. If you would like the dial-in, please submit info here and we'll send you the dial-in details.

Expertise

Taxonomy expertise (technical and domain)
Legal
Webdev
Data architect
UI Design
Search Design

Resources

Challenges and Criticisms

One overarching concern is that the amount of prior art is so vast, poorly organized, undocumented, difficult to describe, and won't attract community participation. The beneficiaries of a prior art database are patent applicants, defendants in future infringement cases, and the individuals or companies that would have been the defendants of infringement suits that the prior art database prevented by preventing the relevant patents from ever issuing. The first group includes adversaries, and participants have no way to tell if they fall into either of the other two groups.
We need the tools to facilitate community participation of the sort we have today and today we have in open source something that merits protection and that may overcome resistance to participating in an enterprise that appears to legitimize software patents (big problem for SPI, which had a number of corporate backers).
An alternative would be a rapid response system that would react only to clear threats, such as pending litigation.
Another alternative would be decentralized disclosure: training developers to publish articles and source code so as to make them useful as prior art references, but in a variety of journals and repositories.
Richard Stallman, in Anatomy of a Trivial Patent, writes, "What's more, the courts are reluctant to overrule the Patent Office, so there is a better chance of getting a patent overturned if you can show a court prior art that the Patent Office did not consider. If the courts are willing to entertain a higher standard in judging unobviousness, it helps to save the prior art for them. Thus, the proposals to "make the system work better" by providing the Patent Office with a better database of prior art could instead make things worse."
By helping applicants write claims to avoid prior art, a prior art database would reduce the average cost of a valid patent. Since most new patents are not licensed for free software use, a prior art database could increase the total patent threat to free software.
If prior art databases actually hurt patent trolls, patent trolls would have found a reason to complain about them.

Project Team

Harvey Anderson, Mozilla Corporation
Emily Berger, Electronic Frontier Foundation
Kris Carpenter Negulescu, Internet Archive
Duane Valz, Yahoo! Inc.
Jason Schultz, UC Berkeley School of Law
Abdy Raissinia, IBM

Related Projects

Software Patent Institute: [1]

Open Source as Prior Art [2]

Patent Commons [3]

Peer to Patent Project [4]

Legal:Prior Art