Normative (hashing only)

This document is not gameplay documentation. It exists to enable independent verification.

HashClue Reference Implementations: Round 1 Serialization & Hashing

Version: 1.0.0 Status: FINAL. Normative for serialization and hashing; Non-Contract Scope: String canonicalization, SHA-256 hashing, and reference test vectors for HashClue Round 1 submissions.

This document is the canonical authority for how submission strings are constructed and hashed. It does NOT define contract logic, UI behavior, economics, or game rules. If an implementation disagrees with this document, the implementation is wrong.


1. SHA-256 Submission Format

Submissions are committed as SHA-256 hashes of a canonical string.

Procedure

  1. Construct the canonical submission string (see §2–§3).
  2. Encode the string as UTF-8 bytes.
  3. Compute SHA-256 over those bytes.
  4. Output the hash as lowercase hexadecimal (64 characters).

Warnings

  • MUST hash the UTF-8 byte encoding of the canonical string.
  • MUST NOT hash a hex-encoded representation of the string.
  • MUST NOT hash displayed, formatted, or pre-processed coordinate values.
  • MUST NOT hash anything other than the exact canonical string defined below.

2. Canonical Submission String

Template

Base format (3 descriptor tokens):

HC1|<LAT>|<LON>|<ENV>.<HST>.<METHOD>

Extended format (BURIED method with 1-2 anchor tokens):

HC1|<LAT>|<LON>|<ENV>.<HST>.<METHOD>.<ANCHOR_1>[.<ANCHOR_2>]

Separator Rules

PositionSeparatorCharacter
Between prefix and LAT|U+007C VERTICAL LINE
Between LAT and LON|U+007C VERTICAL LINE
Between LON and token group|U+007C VERTICAL LINE
Between ENV and HST.U+002E FULL STOP
Between HST and METHOD.U+002E FULL STOP
Between METHOD and ANCHOR_1 (if present).U+002E FULL STOP
Between ANCHOR_1 and ANCHOR_2 (if present).U+002E FULL STOP
  • The prefix HC1 is literal and case-sensitive.
  • There are exactly three | separators.
  • There are two to four . separators depending on anchor presence.
  • Base format: 3 descriptor tokens, 2 dots
  • Extended format: 4-5 descriptor tokens, 3-4 dots (BURIED only)
  • No trailing newline, no BOM, no surrounding whitespace.

3. Canonicalization Pipeline

All canonicalization is pure string manipulation. Floating-point types MUST NOT be used for coordinate canonicalization. Numeric conversion is permitted only for range validation after the canonical string is produced.

A. Sanitize Inputs

  • Trim leading and trailing whitespace from each raw input (LAT, LON, ENV, HST, METHOD, ANCHOR_1, ANCHOR_2 if provided).

B. Token Canonicalization (ENV, HST, METHOD)

For each base token (ENV, HST, METHOD):

  1. Force UPPERCASE (A-Z).
  2. Validate against regex: ^[A-Z0-9_-]+$.
  3. Validate length: 1–50 characters.
  4. REJECT if the token:
    • Is empty.
    • Contains whitespace.
    • Contains | or ..
    • Contains any character outside [A-Z0-9_-].

B1. Anchor Token Canonicalization (ANCHOR_1, ANCHOR_2 if provided)

For each anchor token (if provided):

  1. Force UPPERCASE (A-Z).
  2. Validate starts with positional prefix: BENEATH_, AT_, BY_, ON_, IN_, NEAR_, BEHIND_, UNDER_
  3. Validate against regex: ^[A-Z0-9_-]+$.
  4. Validate length: 1–50 characters.
  5. REJECT if the token:
    • Is empty.
    • Contains whitespace.
    • Contains | or ..
    • Contains any character outside [A-Z0-9_-].
    • Lacks required positional prefix.
    • Describes transient or movable objects (heuristic check required).

B2. Anchor Requirement Validation

After token canonicalization:

  • If METHOD = BURIED, at least one anchor token (ANCHOR_1) MUST be provided. REJECT if missing.
  • If METHOD ≠ BURIED, anchor tokens MUST NOT be provided. REJECT if present.

C. Coordinate Canonicalization (HC-GEO v1): String-Based

For each coordinate (LAT, LON):

  1. Reject if the value contains any whitespace character.
  2. Reject if the value contains a comma (,).
  3. Reject if the value contains e or E (scientific notation).
  4. Reject if the value matches DMS patterns (contains °, ', ").
  5. Remove a leading + if present.
  6. Capture and remove the leading - sign if present. Record is_negative.
  7. Split the remaining string at the first . into INT_PART and FRAC_PART.
    • If no . is present, set FRAC_PART = "".
  8. Truncate FRAC_PART to at most 6 characters (discard characters beyond position 6; never round).
  9. Strip trailing zeros from FRAC_PART.
  10. If FRAC_PART is now empty, set FRAC_PART = "0".
  11. Remove leading zeros from INT_PART, but if INT_PART becomes empty, set it to "0".
  12. Rejoin as INT_PART.FRAC_PART.
  13. If is_negative is true and the result is not "0.0", prepend "-".
    • Special case: -0.0 MUST be canonicalized to 0.0.

D. Range Validation (Numeric, Post-Canonicalization)

After producing the canonical coordinate strings, implementations MAY convert to a numeric type solely for range checking:

  • Latitude: [-90, 90] inclusive.
  • Longitude: [-180, 180] inclusive.
  • Both -180 and +180 are valid longitudes and MUST NOT be normalized.

E. Assemble Canonical String

"HC1" + "|" + LAT + "|" + LON + "|" + ENV + "." + HST + "." + METHOD

F. Hash

  1. Encode the assembled string as UTF-8 bytes.
  2. Compute SHA-256 over those bytes.
  3. Output as 64-character lowercase hexadecimal string.

4. Unicode

  • No Unicode normalization is required.
  • All canonical strings in v1 are ASCII-only by construction.
  • Unicode is out of scope for v1.

5. Reference Implementations

These reference implementations are provided solely to demonstrate canonical serialization and hashing. They are illustrative, not optimized, and not intended as guidance for automated guessing or production systems.

All examples are SYNTHETIC and ILLUSTRATIVE. No real-world locations are used.

Python

import hashlib
import re

def canonicalize_token(raw: str) -> str:
    """Canonicalize a single token (ENV, HST, or METHOD)."""
    token = raw.strip().upper()
    if not token:
        raise ValueError("Token MUST NOT be empty")
    if len(token) > 50:
        raise ValueError("Token exceeds 50 characters")
    if not re.fullmatch(r'[A-Z0-9_-]+', token):
        raise ValueError(f"Token contains invalid characters: {token!r}")
    return token


def canonicalize_coordinate(raw: str) -> str:
    """Canonicalize a coordinate string (HC-GEO v1, pure string-based)."""
    val = raw.strip()

    # Reject whitespace within value
    if re.search(r'\s', val):
        raise ValueError("Coordinate MUST NOT contain whitespace")
    # Reject comma decimal separators
    if ',' in val:
        raise ValueError("Coordinate MUST NOT contain commas")
    # Reject scientific notation
    if 'e' in val or 'E' in val:
        raise ValueError("Coordinate MUST NOT use scientific notation")
    # Reject DMS
    if any(c in val for c in ('°', "'", '"')):
        raise ValueError("Coordinate MUST NOT use DMS format")

    # Remove leading '+'
    if val.startswith('+'):
        val = val[1:]

    # Capture sign
    is_negative = False
    if val.startswith('-'):
        is_negative = True
        val = val[1:]

    # Split at '.'
    if '.' in val:
        int_part, frac_part = val.split('.', 1)
    else:
        int_part = val
        frac_part = ""

    # Truncate fractional part to 6 characters (no rounding)
    frac_part = frac_part[:6]

    # Strip trailing zeros
    frac_part = frac_part.rstrip('0')

    # If empty, set to '0'
    if not frac_part:
        frac_part = '0'

    # Remove leading zeros from integer part
    int_part = int_part.lstrip('0') or '0'

    # Rejoin
    canonical = f"{int_part}.{frac_part}"

    # Restore sign (but normalize -0.0 to 0.0)
    if is_negative and canonical != "0.0":
        canonical = f"-{canonical}"

    return canonical


def validate_range(lat_str: str, lon_str: str) -> None:
    """Range-check after canonicalization. Numeric conversion allowed here only."""
    lat = float(lat_str)
    lon = float(lon_str)
    if lat < -90 or lat > 90:
        raise ValueError(f"Latitude out of range: {lat}")
    if lon < -180 or lon > 180:
        raise ValueError(f"Longitude out of range: {lon}")


def canonicalize_and_hash(
    lat_raw: str,
    lon_raw: str,
    env_raw: str,
    hst_raw: str,
    method_raw: str,
) -> tuple[str, str]:
    """
    Returns (canonical_string, sha256_hex).
    """
    lat = canonicalize_coordinate(lat_raw)
    lon = canonicalize_coordinate(lon_raw)
    env = canonicalize_token(env_raw)
    hst = canonicalize_token(hst_raw)
    method = canonicalize_token(method_raw)

    validate_range(lat, lon)

    canonical = f"HC1|{lat}|{lon}|{env}.{hst}.{method}"
    sha = hashlib.sha256(canonical.encode('utf-8')).hexdigest()
    return canonical, sha

JavaScript

const crypto = require('crypto');

function canonicalizeToken(raw) {
  const token = raw.trim().toUpperCase();
  if (!token) throw new Error('Token MUST NOT be empty');
  if (token.length > 50) throw new Error('Token exceeds 50 characters');
  if (!/^[A-Z0-9_-]+$/.test(token)) {
    throw new Error(`Token contains invalid characters: "${token}"`);
  }
  return token;
}

function canonicalizeCoordinate(raw) {
  const val0 = raw.trim();

  if (/\s/.test(val0)) throw new Error('Coordinate MUST NOT contain whitespace');
  if (val0.includes(',')) throw new Error('Coordinate MUST NOT contain commas');
  if (/[eE]/.test(val0)) throw new Error('Coordinate MUST NOT use scientific notation');
  if (/[°'"]/.test(val0)) throw new Error('Coordinate MUST NOT use DMS format');

  let val = val0;

  // Remove leading '+'
  if (val.startsWith('+')) val = val.slice(1);

  // Capture sign
  let isNegative = false;
  if (val.startsWith('-')) {
    isNegative = true;
    val = val.slice(1);
  }

  // Split at '.'
  let intPart, fracPart;
  const dotIdx = val.indexOf('.');
  if (dotIdx !== -1) {
    intPart = val.slice(0, dotIdx);
    fracPart = val.slice(dotIdx + 1);
  } else {
    intPart = val;
    fracPart = '';
  }

  // Truncate fractional part to 6 characters (no rounding)
  fracPart = fracPart.slice(0, 6);

  // Strip trailing zeros
  fracPart = fracPart.replace(/0+$/, '');

  // If empty, set to '0'
  if (!fracPart) fracPart = '0';

  // Remove leading zeros from integer part
  intPart = intPart.replace(/^0+/, '') || '0';

  // Rejoin
  let canonical = `${intPart}.${fracPart}`;

  // Restore sign (normalize -0.0 to 0.0)
  if (isNegative && canonical !== '0.0') {
    canonical = `-${canonical}`;
  }

  return canonical;
}

function validateRange(latStr, lonStr) {
  const lat = parseFloat(latStr);
  const lon = parseFloat(lonStr);
  if (lat < -90 || lat > 90) throw new Error(`Latitude out of range: ${lat}`);
  if (lon < -180 || lon > 180) throw new Error(`Longitude out of range: ${lon}`);
}

function canonicalizeAndHash(latRaw, lonRaw, envRaw, hstRaw, methodRaw) {
  const lat = canonicalizeCoordinate(latRaw);
  const lon = canonicalizeCoordinate(lonRaw);
  const env = canonicalizeToken(envRaw);
  const hst = canonicalizeToken(hstRaw);
  const method = canonicalizeToken(methodRaw);

  validateRange(lat, lon);

  const canonical = `HC1|${lat}|${lon}|${env}.${hst}.${method}`;
  const sha = crypto.createHash('sha256').update(canonical, 'utf8').digest('hex');
  return { canonical, sha256: sha };
}

module.exports = { canonicalizeAndHash, canonicalizeCoordinate, canonicalizeToken };

6. Examples: Valid and Invalid Submissions

All examples are SYNTHETIC / ILLUSTRATIVE. No real-world locations are represented.

Valid Inputs → Canonical Strings

Raw LATRaw LONENVHSTMETHODCanonical String
12.340000-56.780000env1hst1mth1HC1|12.34|-56.78|ENV1.HST1.MTH1
+39.498333-0.388335ENV1HST1MTH1HC1|39.498333|-0.388335|ENV1.HST1.MTH1
040.0-003.0env1hst1mth1HC1|40.0|-3.0|ENV1.HST1.MTH1
00ENV1HST1MTH1HC1|0.0|0.0|ENV1.HST1.MTH1
-0.00.0ENV1HST1MTH1HC1|0.0|0.0|ENV1.HST1.MTH1
39.4983335-0.3883335ENV1HST1MTH1HC1|39.498333|-0.388333|ENV1.HST1.MTH1

Invalid Inputs (MUST Reject)

InputFieldReason
39.49 8333LATWhitespace within coordinate
39.498333\nLATNewline in coordinate
39,498333LATComma decimal separator
3.9498333e1LATScientific notation
39°29'54"NLATDMS format
HC1|39.1|-0.4|env1|hst1|mth1StringWrong separators (pipes instead of dots between tokens)
HC1|39.1|-0.4|env1.hst1.mth1TokensLowercase tokens (pre-canonicalization this is accepted; shown for awareness that the canonical form is UPPERCASE)
(empty string)ENVEmpty token
ENV 1ENVWhitespace in token
ENV.1ENVReserved separator . in token
ENV|1ENVReserved separator | in token

7. Reference Test Vectors

All values are SYNTHETIC / ILLUSTRATIVE.

Canonical String → SHA-256

IDCanonical StringSHA-256 (lowercase hex)
AHC1|39.498333|-0.388335|ENV1.HST1.MTH17d52f0b07206c3bc016941f271962d3d341ff5f6855d6183f1cc72dbc75fef80
BHC1|39.49833|-0.388335|ENV1.HST1.MTH16597df11cf8f5f89bed2bbd762dc99e31ee509e5bccc1ba7a11c4a8c123d6aec
CHC1|40.0|-3.0|ENV1.HST1.MTH17aa2b79a8d00f653b06e9884548d03cabbd6fe5be29fadc77c330a5e7b60a0ed
DHC1|0.0|0.1|ENV1.HST1.MTH18a96edc559ebc9ff85fe4bbab1be7c421e75975de938d76973b6edc37c529f32
EHC1|39.1|-0.4|ENV1.HST1.MTH1aa17cafb684e8dd200b0a92e5c97598136b538600cb4ddd7cec8f83f8615b63c
FHC1|0.0|0.0|ENV1.HST1.MTH101114c759aef655ed82828699e28593cebb99746f527e98dee4a53c17c5c8836
GHC1|90.0|0.0|ENV1.HST1.MTH1a6c943e7b0972c20ab15c349745ae5dc145641bc2692296edbb628d2ce080c38
HHC1|-90.0|0.0|ENV1.HST1.MTH174643bd03741e2b0c3425b2899af63c970a6698eb6dd4ac934d918bd899a9810
IHC1|0.0|-180.0|ENV1.HST1.MTH1ea5e86dc90b66c12c5a860c38344142dbd6910a49e5f183c6025e92a09b48c39
JHC1|0.0|180.0|ENV1.HST1.MTH180a2369bd99eaa98e1fed617e838490a70a362215b2e957b147129a071a011e9
KHC1|-0.000001|0.0|ENV1.HST1.MTH185d7c24bd46c65ef9df5d283a23bfee3d9b86855d42382b494605e7b88bca2a4
LHC1|39.498289|-0.388206|OUTDOOR.GROUND.BURIED.BENEATH_PERSISTENT_OBJECT(compute SHA-256 of canonical string)
MHC1|39.498289|-0.388206|OUTDOOR.GROUND.BURIED.BENEATH_PERSISTENT_OBJECT.AT_FIXED_LANDMARK(compute SHA-256 of canonical string)

Note: Test vectors L and M demonstrate BURIED method with anchor descriptors. Actual SHA-256 hashes depend on computing the hash of the canonical string shown.

Truncation vs. Rounding Demonstration

These examples confirm that digits beyond 6 decimal places are discarded, never rounded.

Raw InputTruncated (6 dp max)Canonical Output
39.498333539.498333 (digit 5 discarded)39.498333
-0.3883335-0.388333 (digit 5 discarded)-0.388333
12.345678912.345678 (digit 9 discarded)12.345678
1.100000001.100000 → strip zeros → 1.11.1

8. Implementation Checklist

  • Canonical string uses base template HC1|<LAT>|<LON>|<ENV>.<HST>.<METHOD> or extended template HC1|<LAT>|<LON>|<ENV>.<HST>.<METHOD>.<ANCHOR_1>[.<ANCHOR_2>] for BURIED method.
  • Base descriptor tokens (ENV, HST, METHOD) are uppercased and validated against ^[A-Z0-9_-]+$, length 1–50.
  • Anchor tokens (if present) are uppercased, validated against ^[A-Z0-9_-]+$, and must start with one of eight positional prefixes (BENEATH_, AT_, BY_, ON_, IN_, NEAR_, BEHIND_, UNDER_).
  • Anchor tokens must not describe transient or movable objects.
  • METHOD = BURIED requires at least one anchor token (ANCHOR_1). Implementation MUST reject BURIED without anchors.
  • Non-BURIED methods MUST reject any anchor tokens. Implementation MUST reject anchors for non-BURIED methods.
  • All tokens reject |, ., whitespace, and empty values.
  • Coordinates reject whitespace, commas, scientific notation, and DMS.
  • Coordinate canonicalization is pure string-based (no floats).
  • Fractional digits are truncated to 6 (never rounded).
  • Trailing zeros are stripped; bare decimal becomes .0.
  • Leading zeros are stripped from the integer part.
  • -0.0 normalizes to 0.0.
  • Leading + is removed.
  • Latitude range: [-90, 90]. Longitude range: [-180, 180]. Both bounds inclusive.
  • -180 and +180 longitude are both valid and not normalized.
  • SHA-256 is computed over UTF-8 bytes of the canonical string.
  • Hash output is lowercase hexadecimal, 64 characters.
  • All test vectors in §7 produce matching hashes.

9. Versioning and Immutability

  • This document defines v1.0.0 of the HashClue serialization and hashing specification.
  • Once a round begins, this specification is immutable for that round.
  • No corrections, clarifications, or errata apply retroactively to an active round.
  • Any fixes or changes MUST be published as a new version and SHALL apply only to future rounds.