Internet-Draft Robots Exclusion Protocol Extension to m October 2024
Canel & Madhavan Expires 24 April 2025 [Page]
Workgroup:
Internet Engineering Task Force
Internet-Draft:
draft-canel-robots-ai-control-00
Updates:
9309 (if approved)
Published:
Intended Status:
Informational
Expires:
Authors:
F. Canel, Ed.
Microsoft Corporation
K. Madhavan
Microsoft Corporation

Robots Exclusion Protocol Extension to manage AI content use

Abstract

This document extends RFC9309 by specifying additional rules for controlling usage of the content in the field of Artificial Intelligence (AI).

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 24 April 2025.

Table of Contents

1. Introduction

While the Robots Exclusion Protocol enables service owners to control how, if at all, automated clients known as crawlers may access the URIs on their services as defined by [RFC8288], the protocol doesn't provide controls on how the data returned by their service may be used in training generative AI foundation models.

Application developers are requested to honor these tags. The tags are not a form of access authorization however.

2. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

3. Specification

3.1. Robots Control Rules

The possible values of the rules complementing existing allow, disallow rules are:

The values are case insensitive and honor the same matching logic as Allow and disallow rules. When Allow and Disallow rules define if the content can be downloaded, AllowAITraining and DisallowAITraining rules only apply rules on usage of the content for AI training.

3.2. Application Layer Response Header

The same rules can also be set in the Application Layer Response Header:

The values are case insensitive and honor the same matching logic as Allow and disallow rules.

3.3. HTML Meta Element

Same rules can also be set via an HTML meta tag:

4. IANA Considerations

TODO: https://www.rfc-editor.org/rfc/rfc9110.html#name-field-name-registry