CSS space expansion

Unofficial Proposal Draft,

This version:
https://specs.rivoal.net/css-space-expansion/
Issue Tracking:
GitLab
Inline In Spec
GitHub Issues
Editor:
Florian Rivoal

Abstract

This proposal explores a way to turn Zero Width Spaces into visible spaces. The driving use case is support for optional spacing in Japanese (known as “分かち書き”) for the benefit of language learners and people with dyslexia. If adopted, this proposal is expected to be incorporated into [CSS-TEXT-3] or [CSS-TEXT-4].

CSS is a language for describing the rendering of structured documents (such as HTML and XML) on screen, on paper, in speech, etc.

Status of this document

1. Introduction

This section is non-normative.

In a number of languages and writing system, such as Japanese or Thai, words are not deliminated by spaces (or any other character) as is the case in English (See Approaches to line breaking for a discussion the approach various languages take to word separation and line breaking).

However, even if text without spaces is the dominant style in such languages, there are cases where making word boundaries (or phrase boundaries) visible through the use of spaces is desired. This is a purely stylistic effect, with no implication on the semantics of the text.

In Japan for instance, this is commonly done in books for people learning the language—young children or foreign students. People with dyslexia also tend to find this style easier to read. Recent pushes by the Japanese government for electronic text-books have raised the demand for this type of features, and proprietary ebook solutions are being proposed.

The mechanism proposed in this specification builds upon the existing use of the wbr element or of U+200B ZERO WIDTH SPACE in the document markup as a word (or phrase) delimiter. While this practice is not that common, it is a semantically valid use of that unicode character, and the ability to trigger stylistic effects based on it can only encourage its use.

2. Expanding Zero Width Spaces: the zero-width-space-expansion property

Name: zero-width-space-expansion
Value: none | space | ideographic-space
Initial: none
Applies to: inline boxes
Inherited: yes
Percentages: N/A
Media: visual
Computed value: as specified
Canonical order: per grammar
Animation type: discrete

This name is too verbose. To be bikeshedded.

Should we allow more freeform values, like <string>, possibly limited to 1 character?

This property enables all instances of U+200B ZERO WIDTH SPACE to be replaced by the specified character. Instances of wbr are considered equivalent to U+200B, and are also replaced. This substitution happens before layout, so all layout operations that depend on the characters in the content (such as CSS Text Module Level 3 §white-space-rules, line breaking, or intrinsic sizing) must use that character instead of the original U+200B.

none
This property has no effect.
space
All instances of U+200B ZERO WIDTH SPACE are replaced by U+0020 SPACE.
ideographic-space
All instances of U+200B ZERO WIDTH SPACE are replaced by U+3000 IDEOGRAPHIC SPACE.

Like text-transform, this property transforms text for styling purposes. It has no effect on the underlying content, and must not affect the content of a plain text copy & paste operation.

This almost looks like instead of a new property, we could just add two new values of the text-transform property. However:

Unlike books for adults, Japanese books for young children often feature spaces between sentence segments, to facilitate reading.

Absent any particular styling, the following sentence would be rendered as depicted below.

<p>むかしむかし、<wbr>あるところに、<wbr>おじいさんと<wbr>おばあさんが<wbr>すんでいました。

むかしむかし、あるところに、おじいさんとおばあさんがすんでいました。


Phrase-based spacing can be achieved with the following css:

p {
  zero-width-space-expansion: ideographic-space;
}

むかしむかし、 あるところに、 おじいさんと おばあさんが すんでいました。


Another common variant additionally restricts the allowable line breaks to these phrase boundaries. Using the same markup, this is easily achieved with the following css:

p {
  word-break: keep-all;
  zero-width-space-expansion: ideographic-space;
}

むかしむかし、 あるところに、 おじいさんと おばあさんが すんでいました。

In addition to making the source code more readable, using wbr rather than U+200B in the markup also allow authors to classify the delimiters into different groups.

In the following example, wbr elements are either unmarked when they delimit a word, or marked with class p when they also delimit a phrase.

<p>らいしゅう<wbr><wbr>じゅぎょう<wbr><wbr class=p
>たいこ<wbr><wbr>ばち<wbr><wbr class=p
>もって<wbr>きて<wbr>ください。

Using this, it is possible not only to enable the rather common phrase based spacing, but also word by word spacing that is likely to be preferred by people with dyslexia to reduce ambiguities, or other variants such as a combination of phrase-based spacing and of word-based wrapping.

Usual rendering

らいしゅうじゅぎょうたいこばちもってきてください。


Phrase spacing
p wbr.p {
  zero-width-space-expansion: ideographic-space;
}

らいしゅうじゅぎょうに たいこばちを もってきてください。


Word spacing
p wbr {
  zero-width-space-expansion: ideographic-space;
}

らいしゅう の じゅぎょう に たいこ と ばち を もって きて ください。


Phrase spacing, word wrapping
p {
  word-break: keep-all;
}
p wbr.p {
  zero-width-space-expansion: ideographic-space;
}

らいしゅうじゅぎょうに たいこばちを もってきてください。


Word spacing and wrapping
p {
  word-break: keep-all;
}
p wbr {
  zero-width-space-expansion: ideographic-space;
}

らいしゅう の じゅぎょう に たいこ と ばち を もって きて ください。

Apendix A. Security and Privacy Considerations

This appendix is non-normative.

There are no known security or privacy impacts of this feature.

The W3C TAG is developing a Self-Review Questionnaire: Security and Privacy for editors of specifications to informatively answer. As far as currently known, here are the answers to the Questions to Consider:

Does this specification deal with personally-identifiable information?
No
Does this specification deal with high-value data?
No.
Does this specification introduce new state for an origin that persists across browsing sessions?
No.
Does this specification expose any other data to an origin that it doesn’t currently have access to?
No.
Does this specification enable new script execution/loading mechanisms?
No.
Does this specification allow an origin access to a user’s location?
No.
Does this specification allow an origin access to sensors on a user’s device?
No.
Does this specification allow an origin access to aspects of a user’s local computing environment?
No.
Does this specification allow an origin access to other devices?
No.
Does this specification allow an origin some measure of control over a user agent’s native UI?
No.
Does this specification expose temporary identifiers to the web?
No.
Does this specification distinguish between behavior in first-party and third-party contexts?
No.
How should this specification work in the context of a user agent’s "incognito" mode?
No difference in behavior is expected or needed.
Does this specification persist data to a user’s local device?
No.
Does this specification have a "Security Considerations" and "Privacy Considerations" section?
Yes, this is the role of this Appendix.
Does this specification allow downgrading default security characteristics?
No.

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Advisements are normative sections styled to evoke special attention and are set apart from other normative text with <strong class="advisement">, like this: UAs MUST provide an accessible alternative.

Conformance classes

Conformance to this specification is defined for three conformance classes:

style sheet
A CSS style sheet.
renderer
A UA that interprets the semantics of a style sheet and renders documents that use them.
authoring tool
A UA that writes a style sheet.

A style sheet is conformant to this specification if all of its statements that use syntax defined in this module are valid according to the generic CSS grammar and the individual grammars of each feature defined in this module.

A renderer is conformant to this specification if, in addition to interpreting the style sheet as defined by the appropriate specifications, it supports all the features defined by this specification by parsing them correctly and rendering the document accordingly. However, the inability of a UA to correctly render a document due to limitations of the device does not make the UA non-conformant. (For example, a UA is not required to render color on a monochrome monitor.)

An authoring tool is conformant to this specification if it writes style sheets that are syntactically correct according to the generic CSS grammar and the individual grammars of each feature in this module, and meet all other conformance requirements of style sheets as described in this module.

Requirements for Responsible Implementation of CSS

The following sections define several conformance requirements for implementing CSS responsibly, in a way that promotes interoperability in the present and future.

Partial Implementations

So that authors can exploit the forward-compatible parsing rules to assign fallback values, CSS renderers must treat as invalid (and ignore as appropriate) any at-rules, properties, property values, keywords, and other syntactic constructs for which they have no usable level of support. In particular, user agents must not selectively ignore unsupported property values and honor supported values in a single multi-value property declaration: if any value is considered invalid (as unsupported values must be), CSS requires that the entire declaration be ignored.

Implementations of Unstable and Proprietary Features

To avoid clashes with future stable CSS features, the CSSWG recommends following best practices for the implementation of unstable features and proprietary extensions to CSS.

Implementations of CR-level Features

Once a specification reaches the Candidate Recommendation stage, implementers should release an unprefixed implementation of any CR-level feature they can demonstrate to be correctly implemented according to spec, and should avoid exposing a prefixed variant of that feature.

To establish and maintain the interoperability of CSS across implementations, the CSS Working Group requests that non-experimental CSS renderers submit an implementation report (and, if necessary, the testcases used for that implementation report) to the W3C before releasing an unprefixed implementation of any CSS features. Testcases submitted to W3C are subject to review and correction by the CSS Working Group.

Further information on submitting testcases and implementation reports can be found from on the CSS Working Group’s website at http://www.w3.org/Style/CSS/Test/. Questions should be directed to the public-css-testsuite@w3.org mailing list.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[CSS-DISPLAY-3]
Tab Atkins Jr.; Elika Etemad. CSS Display Module Level 3. URL: https://drafts.csswg.org/css-display/
[CSS-SIZING-3]
Tab Atkins Jr.; Elika Etemad. CSS Intrinsic & Extrinsic Sizing Module Level 3. URL: https://drafts.csswg.org/css-sizing-3/
[CSS-TEXT-3]
Elika Etemad; Koji Ishii; Florian Rivoal. CSS Text Module Level 3. URL: https://drafts.csswg.org/css-text-3/
[CSS-VALUES-3]
Tab Atkins Jr.; Elika Etemad. CSS Values and Units Module Level 3. URL: https://drafts.csswg.org/css-values-3/
[CSS-VALUES-4]
Tab Atkins Jr.; Elika Etemad. CSS Values and Units Module Level 4. URL: https://drafts.csswg.org/css-values-4/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119

Informative References

[CSS-TEXT-4]
Elika Etemad; Koji Ishii; Alan Stearns. CSS Text Module Level 4. URL: https://drafts.csswg.org/css-text-4/

Property Index

Name Value Initial Applies to Inh. %ages Media Anim­ation type Canonical order Com­puted value
zero-width-space-expansion none | space | ideographic-space none inline boxes yes N/A visual discrete per grammar as specified

Issues Index

This name is too verbose. To be bikeshedded.
Should we allow more freeform values, like <string>, possibly limited to 1 character?
This almost looks like instead of a new property, we could just add two new values of the text-transform property. However: