Revision as of 16:53, 4 June 2008

Forward History This page contains the project plan for a software prior art initiative under consideration. This page is "in progress." We welcome any constructive input. If you see sections that need help, of which there are many, or you have suggested improvements, please make the changes.

Contents

Summary

By creating a user generated content (UGC) tool, accused patent infringers and interested parties can assess and evaluate the validity of software patents with an open catalog system. Users will be able to search and compare descriptions (tags) of non-patented prior art references, including both commercial and non-commercial software implementations, against defined elements of a patent. The tools are designed for any interested party or defendant to use to find invalidating prior art software. The UGC is posted by any interested party, and sufficiently tagged via the poster such that the internal processes, modules, and structures of the software reference can be easily matched against patent claim elements. The catalog would be populated both incrementally by interested parties who wish to record their software developments (commercial and non-commercial) and by data dumps from prior patent litigation cases. Each domain area and associated taxonomy can grow incrementally with the assistance of community editorial groups.

Goals

Create a free, open, searchable database of non-patented software prior art to help give proper credit and prevent invalid patents from being asserted.

Provide software developers a means to document, define, and record their inventions without filing a patent.

Prevent others from subsequently filing patents on innovations developed first by someone else.

note: The above is true mostly for the US where Prior Art takes precedence over filing date. In many countries it's the date of filing that matters.

Problem Statement

The history of innovations in software is not well recorded, consequently, when presented with a patent whose validity is in question, it is difficult and expensive to identify and find non-patent prior art which may invalidate the patent in question.

The reason it is difficult is because the process of identifying and finding invalidating prior art is contingent upon effectively comparing the elements, modules, processes in a software reference against the specific claims of a patent. Because the source code of the vast majority of software references is not initially available except by subpoena, there is no effective way to determine whether a software reference discloses the elements of a patent claim. As stated by the Software Patent Institute "...What is needed is not the detailed code but some level of description of what is in the code. ... To effectively find prior art, algorithms, control flow, data structures, and underlying processes must be transparent in commercial and non-commercial code, which today it is not."

The result, in conjunction with other systemic factors in both the patent examination process and the litigation process is that:

Too many software patents are issued that are, in fact, not novel or non-obvious. This diminishes the integrity of the patent process and allows the unjust extraction of value from a commercial ecosystem.

The assertion and enforcement of such patents imposes significant and material legal defense costs on defendants, particularly for emerging companies and start-ups, which causes the diversion of resources from innovation and development to non-productive litigation defense.

Benefits

Ensure that only valid patents are asserted and enforceable.

Ensure that the actual inventors are credited with their innovations. An invalid software patent grants control over inventions to people who didn’t create them.

Make the prior art value of real software more accessible, because most coded software innovation has not patented or otherwise adequately documented to date.

Reduce cost (time and money) of identifying prior art.

Project Scope

What's Included

The database should include the following kinds of software and corresponding meta-data:

Commercial Software Products
Non-commercial Software Products
Published papers and research projects
Open source projects
Technical Disclosures

The database can also contain either: 1) links to a trusted repository where the reference is stored; or 2) a copy of the reference itself.

What's Not Included

The database is not intended to address patented publications, although it could (e.g., relevant patent references can be included as meta-tags to software references included in the database). (There are already good search tools and efforts to make patent searches better.)

Intended Audience

Software Developers Seeking to Understand or Obtain the Prior Art
Members of Open Source Communities Seeking to Share Innovations
Individuals or Entities Subject to a Patent Suit or Adversarial License Negotiations
Extendable to the USPTO for Researching Prior Art in connection with the Examination of Patent Applications claiming software or software-related inventions
Extendable to All Persons Researching Prior Art in Software or Other Technology fields

Milestones

Alpha Version. The goal is to define a potential system and build an initial implementation with the most basic features this year. Let's call this first rough implementation an "Alpha" version.

Testing: Once that's complete, we want to test the Use Cases and see if it works as intended, and identify critical enhancements and/or obstacles. This testing needs to include potential users (developers, defendants, attys).

Evaluation. Based on the feedback, we need to determine if the project is viable, and if so, what enhancements are needed to overcome key obstacles. If it doesn't work or we can't make the system easy enough so it can succeed, we need to try something else. If it is viable, we will make critical changes, and scale up testing again to a broader and larger audience.

Beta Version. Wash and Repeat.

Use Cases

Searching for a Reference

A user knows of a software patent and wishes to find prior art related to the patent.
Prior art in this case being one or more references (document, executable code or description of software/system) which contain(s) all of the elements of at least one claim of the patent.
The user knows the patent claim elements.
The user accesses the database to perform a search using the claim elements.
A search screen is presented that allows the user to search on multiple fields which correspond to the patent claim elements as well as other meta-data such as date, author, field of use.
The claim elements for the search are presented as predetermined tags from a drop down list.
The user can add additional tags which, after review by a moderator, are added to the tagging schema.
Search results are returned which reflect matches to the search criteria.

Reference Input

Incremental Prior Art includes those printed publications which are self-contained, such as a .pdf of an article, a segment of code, or any other document that has its material contained within the document's four corners online or offline.

A User Wants to Record A Software Development

+ Initial Page allows User to create a reference; 1. Note whether other pages exist with this exact name; 2. Provide for search to check for tags or spellings; 3. Suggest other categories of pages and tags that are similar subject matter; 4. Suggest Creation of User's page under the category or tag searched initially; 5. Suggest Search of Category or Tag searched initially in existing pages;

+ User's Page has Blanks to prompt submissions in the following categories: 1. Description that describes in plain language (<50 words) the subject matter of the development; 2. Relevant excerpt of novel provisions (e.g., sentence around tagged area); 3. Copy of document or code itself; 4. Executable file or permalink to same; 5. Other related documents and references.

+ Place for these documents to sit. + Create a functionality where we automatically grab files from the url and bring it to our repository.

User Wants to Tag a Document So Later Users Can Find It

A user knows of prior art related to a software patent or believes that the reference is novel.
Prior art in this case being one or more references (document, executable code or description of software/system) which contain(s) all of the elements of at least one claim of the patent.
An input screen is presented that allows the user to enter and tag on multiple fields which correspond to the patent claim elements as well as other meta-data such as date, author, field of use.
Each of these multiple fields that corresponds with a field on another piece of prior art will link through to a web-page that allows users to add content and English-language descriptions in a wiki-style format.
Each of these web pages allows the users to create links between them, such that where a synonym for a term is used, links to the related terms appear at the top of each page.

User Wants to Record A Collection of Prior Art References

Case Files, such as those maintained on PACER in active federal court patent litigation contain a wealth of information. For example, answers to interrogatories often include a laundry list of potential prior art. To collect and maintain the most amount of information with limited input, it is necessary to tag and catalog these documents.

Components

Online Database (Freebase) that functions as a repository for UGC, including meta-data and actual code or link to repository. (optional).

Search interface that supports multi-element search against meta-data. Should support pattern matching, ie. patent claim v. meta-data tags.

Input interface to support input of references (single input and bulk input).

Taxonomy which can be used to index and tag references. Must be easily extensible.

Hosting. Hosting facility for online database.

User Registration module for log-in, user preference management, permissions managaement.

Tags & Meta-data

What is a Tag?

Tags are keywords, but they link through to wiki-style information pages. Some of the tags (see bottom of this section) are predetermined. These are Tags that we've chosen to provide some consistency across pages and categories. But others may be chosen by the user. So, tagging can be a lot easier and more flexible than fitting your information into preconceived categories or folders.

For example, if you save an article about how to write a certain kind of program from a workstation, you can tag it with terminology describing the program and also with workstation, server, microcomputer, Unix, RISC, or whatever other tags you might use to find it again. You don't have to rely on the designer of a system to provide you with a category for computer programs or workstations. You make up tags as you need them, and use the tags that make the most sense to you.

This is great for organizing and finding prior art and new publications, but it goes even further when someone else posts related content using the same tags. You begin building a collaborative repository of related information, driven by personal interests and creative organization. Now, as the same tags are linked to, you can find all of the prior art sharing common tags and add to the information pages about them.

How do I tag?

When saving a new publication authored by you, create a field for tags. In this field, enter as many tags as you would like, each separated by a space. Recommended tags are a combination of tags you have already used and tags that other people have used. You are under no obligation to use these! They are only there to help you. What tags or words would help you remember this page a few years from now? That's a good place to start.

The only limitation on tags is that they must not include spaces. So if your program is written from a Unix workstation, it should get two separate tags if it could occur at any workstation, and this one happens to be on Unix, or one tag that is "Unixworkstation" if that is the only way place the program works. You probably don't want to use commas, though, since a comma will be become part of the tag.

Below is an initial list of the kinds of Tags we've chosen that should be used to describe the references wherever the information is known.

Reference Name
Software Type Category(s)
Sub-Type Category(s)
Purpose of the Software
Owner
Developers/Creators of the Software
Earliest Date of Use or Disclosure
Combinations (Tags to list the kinds of other software, systems, hardware the reference is/could be used with)
Software Elements (multiple tags that describe the key components, modules, processes, of the software reference).
Resource Experts Knowledgeable About the Software
Link pointing to an archive copy or trusted repository
Copy of the code, document, or reference

Finally, any new reference that is being newly created or that has been found by a user that doesn't have time to make a more indepth analysis can be tagged initially with: "PatentProof." This will allow it to be picked up by search engines and tagged by others.

Functional Requirements

Operation

Hosted database accessible via the web
User/name password login required to edit/modify existing entries and for admin
No registration required for search or data input

Appearance & Performance

Target 98% uptime
Return search results fast
Target 20 simultaneous users

Web Content

We need the capacity to include at least the following:

Search results/output
FAQ
Instructions for how to use the system (text, video, exemplars)

We need to figure out:

How many items of information appear on a single output page (e.g. tabs vs. hyperlinks)
What kind of interface would convey the information more succinctly and intuitively
How the user would interact with the output to provide feedback on it, edit it, add to it, etc.

Data Input

Input meta-data which describe a reference (software application, i.e. date, name, etc.)
Determine if meta-data already exists, if not new, present similar records, if new, allow new input.
Input Tags which describe reference functionality
Reference functionality includes features of the software that describe the processes, modules, functions of the software.
Upload electronic copies of reference documents (PDF, DOC, XlS, PPT, Google Docs, TIF, JPEG)
OCR electronic references if possible
Reference may consist solely of a written description in narrative form
Reference may have multiple tags
User can select from predetermined list of tags
User can add new tags and the new tags become part of the pre-select list
System has semantic pattern function to present related words when a user inputs a tag. i.e. user inputs data storage, and system presents user with option to select related words which are displayed. storage - RAM, memory, etc.
Admin capability to modify/delete/add tags to the list

Meta-Data

There are two types: i) fixed atttributes which describe the reference and the other administrative data related to the objects; and ii) extensible attributes, more subjective, which describe the functions, purpose, relatedness of a reference.

Fixed/Objective Characteristics

Below is an initial list of the kinds of Tags we've chosen that should be used to describe the references wherever the information is known.

Reference Name (Fixed)
Software Type Category(s)
Sub-Type Category(s)
Purpose of the Software
Owner (Fixed)
Author of the Software (Fixed)
Earliest Date of Use or Public Disclosure (Fixed)
Other Resources ( Experts Knowledgeable About the Software )
- Names
- email contact
Link pointing to an archive copy or trusted repository (suggest SourceForge)
Time-stamp function to track and report date the record was created and published.

Extensible/Subject Characteristics

Software Type Category(s)
Sub-Type Category(s)
Purpose of the Software
Description of the software
Tags describing the functional elements of the software
Tags to list the kinds of other software, systems, hardware the reference is/could be used with
Cross-reference to other references that may be within the database or outside
Auto-suggestion feature to suggest cross-references
Semantic pattern matching function as described above

Searching

Default Search

Text box for entry of keywords (optional)
Should support option to search against tags/metadata only, search of text of claims and equivalent resources only,
Should support option for semantic search, ie. RAM would return ROM.
Should support search across any combination of the above simultaneously
Should support boolean operators and common syntax such as +, -, "" for excluding and requiring terms or phrases
Must include the option to constrain the search based on required meta data fields

Default Results

20 results per page, sorted by relevancy, if keywords are included, or by date, if metadata alone is searched
Each result listed should include required meta data and a brief text snippet describing the resource, when available
Should be able to filter results based on one or more of the required meta data fields

Reporting

Categories should be user-definable (add/edit/delete functions).
Field to associate with any event (meeting, Address, Call, etc.) to a category. There could be some default categories (e.g., Family, Friends, Work, Clients, Subcontractor, Restaurants, Computer) or perhaps example category sets for different purposes (e.g. business, software development, education, single person, family)
Events can be filtered based on categories.
Events can be associated with multiple categories.
Should be able to be colored based on differing categories

Exporting

Ability to export data to common data formats (xls, csv)
Ability to export only selected events.
Export to HTML (pretty much the same as Print).

Printing

Print search results or descriptions of references

Administrative Console

Manage taxonomies
Create reports
Run reports
Allocate permissions to others to manage domain areas in a given taxonomy

Outreach and Activism

Working to get the word out to the technological community and to the legal community is an important aspect to creating a large-scale database that is usable to the public. Suggestions for how to do so are as follows:

1. Make a effort to consider technologists motivations and incentives to tag their work for purposes of being more proactive rather than reactive. We seek to inform innovators of the benefits of sharing their work with others and at the same time of the benefit of officially disclosing art in a manner that may prevent future patentees from seeking coverage of the same. Building a shared knowledge base will equip the public to better handle potential patent suits.

How Can You Get Involved

Weekly Call

There is a weekly project team call on Wednesdays at 11am PDT. If you would like the dial-in, please submit info here and we'll send you the dial-in details.

Expertise

Taxonomy expertise (technical and domain)
Legal
Webdev
Data architect
UI Design
Search Design

Resources

Challenges and Criticisms

One overarching concern is that the amount of prior art is so vast, poorly organized, undocumented, difficult to describe, and won't attract community participation. The beneficiaries of a prior art database are patent applicants, defendants in future infringement cases, and the individuals or companies that would have been the defendants of infringement suits that the prior art database prevented by preventing the relevant patents from ever issuing. The first group includes adversaries, and participants have no way to tell if they fall into either of the other two groups.
We need the tools to facilitate community participation of the sort we have today and today we have in open source something that merits protection and that may overcome resistance to participating in an enterprise that appears to legitimize software patents (big problem for SPI, which had a number of corporate backers).
An alternative would be a rapid response system that would react only to clear threats, such as pending litigation.
Another alternative would be decentralized disclosure: training developers to publish articles and source code so as to make them useful as prior art references, but in a variety of journals and repositories.
Richard Stallman, in Anatomy of a Trivial Patent, writes, "What's more, the courts are reluctant to overrule the Patent Office, so there is a better chance of getting a patent overturned if you can show a court prior art that the Patent Office did not consider. If the courts are willing to entertain a higher standard in judging unobviousness, it helps to save the prior art for them. Thus, the proposals to "make the system work better" by providing the Patent Office with a better database of prior art could instead make things worse."
By helping applicants write claims to avoid prior art, a prior art database would reduce the average cost of a valid patent. Since most new patents are not licensed for free software use, a prior art database could increase the total patent threat to free software.
If prior art databases actually hurt patent trolls, patent trolls would have found a reason to complain about them.

Project Team

Harvey Anderson, Mozilla Corporation
Emily Berger, Electronic Frontier Foundation
Kris Carpenter Negulescu, Internet Archive
Duane Valz, Yahoo! Inc.
Jason Schultz, UC Berkeley School of Law

Related Projects

Software Patent Institute: [1]

Open Source as Prior Art [2]

Patent Commons [3]

Peer to Patent Project [4]

@@ Line 254: / Line 254: @@
 Default Search
 * Text box for entry of keywords (optional)
-* Should support option to search against tags/metadata only, search of text of claims and equivalent resources only, and search across both simultaneously
+* Should support option to search against tags/metadata only, search of text of claims and equivalent resources only,
+* Should support option for semantic search, ie. RAM would return ROM.
+* Should support search across any combination of the above simultaneously
 * Should support boolean operators and common syntax such as +, -, "" for excluding and requiring terms or phrases
 * Must include the option to constrain the search based on required meta data fields
-* Should support option for semantic search, ie. RAM would return ROM.
 Default Results

Legal:Prior Art: Difference between revisions