Skip to content

Schema Operations

This section documents the schema operations functions used throughout the NDF Studio backend.

backend.core.schema_ops

NDF Studio Schema Operations

This module provides functions for managing schema definitions in the NDF Studio backend. It handles attribute types, relation types, and node types, including loading, saving, validation, and CNL (Controlled Natural Language) parsing for schema definitions.

Functions:

Name Description
ordered_schema_dict

Create an ordered dictionary with specified key order

ensure_schema_file

Ensure a schema file exists with default data

load_schema

Load schema data from file with default fallback

validate_schema_entry

Validate that a schema entry has required keys

save_schema

Save schema data to file with proper formatting

load_schema_json

Load JSON schema with default data creation

create_attribute_type_from_dict

Create attribute type from dictionary data

create_relation_type_from_dict

Create relation type from dictionary data

parse_cnl_block

Parse CNL block for schema definitions

filter_used_schema

Filter schema to only include used types

Functions

ordered_schema_dict(entry: dict, key_order: list[str]) -> OrderedDict

Create an ordered dictionary with specified key order.

This function creates an OrderedDict from a regular dictionary, ensuring that keys appear in the specified order. Keys not in the order list are appended at the end.

Parameters:

Name Type Description Default
entry dict

Input dictionary to reorder

required
key_order list[str]

List of keys in desired order

required

Returns:

Name Type Description
OrderedDict OrderedDict

Dictionary with keys in specified order

Example

entry = {"description": "A type", "name": "MyType", "extra": "value"} ordered = ordered_schema_dict(entry, ["name", "description"]) list(ordered.keys()) ['name', 'description', 'extra']

Source code in backend/core/schema_ops.py
def ordered_schema_dict(entry: dict, key_order: list[str]) -> OrderedDict:
    """
    Create an ordered dictionary with specified key order.

    This function creates an OrderedDict from a regular dictionary, ensuring
    that keys appear in the specified order. Keys not in the order list
    are appended at the end.

    Args:
        entry (dict): Input dictionary to reorder
        key_order (list[str]): List of keys in desired order

    Returns:
        OrderedDict: Dictionary with keys in specified order

    Example:
        >>> entry = {"description": "A type", "name": "MyType", "extra": "value"}
        >>> ordered = ordered_schema_dict(entry, ["name", "description"])
        >>> list(ordered.keys())
        ['name', 'description', 'extra']
    """
    ordered = OrderedDict()
    for key in key_order:
        if key in entry:
            ordered[key] = entry[key]
    for key in entry:
        if key not in ordered:
            ordered[key] = entry[key]
    return ordered

ensure_schema_file(file_name, default_data)

Ensure a schema file exists with default data.

This function checks if a schema file exists and creates it with default data if it doesn't exist. It also ensures the schema directory exists.

Parameters:

Name Type Description Default
file_name str

Name of the schema file

required
default_data

Default data to write if file doesn't exist

required

Returns:

Name Type Description
str

Path to the schema file

Example

ensure_schema_file("attribute_types.json", []) 'graph_data/global/attribute_types.json'

Source code in backend/core/schema_ops.py
def ensure_schema_file(file_name, default_data):
    """
    Ensure a schema file exists with default data.

    This function checks if a schema file exists and creates it with default data
    if it doesn't exist. It also ensures the schema directory exists.

    Args:
        file_name (str): Name of the schema file
        default_data: Default data to write if file doesn't exist

    Returns:
        str: Path to the schema file

    Example:
        >>> ensure_schema_file("attribute_types.json", [])
        'graph_data/global/attribute_types.json'
    """
    file_path = os.path.join(GLOBAL_SCHEMA_PATH, file_name)
    if not os.path.exists(file_path):
        os.makedirs(GLOBAL_SCHEMA_PATH, exist_ok=True)
        with open(file_path, "w", encoding="utf-8") as f:
            json.dump(default_data, f, indent=2)
    return file_path

load_schema(file_name, default_data)

Load schema data from file with default fallback.

This function loads schema data from a file. If the file doesn't exist, it creates it with the default data and returns the default data.

Parameters:

Name Type Description Default
file_name str

Name of the schema file

required
default_data

Default data to use if file doesn't exist

required

Returns:

Name Type Description
list

Schema data from file or default data

Example

schema = load_schema("attribute_types.json", []) isinstance(schema, list) True

Source code in backend/core/schema_ops.py
def load_schema(file_name, default_data):
    """
    Load schema data from file with default fallback.

    This function loads schema data from a file. If the file doesn't exist,
    it creates it with the default data and returns the default data.

    Args:
        file_name (str): Name of the schema file
        default_data: Default data to use if file doesn't exist

    Returns:
        list: Schema data from file or default data

    Example:
        >>> schema = load_schema("attribute_types.json", [])
        >>> isinstance(schema, list)
        True
    """
    file_path = ensure_schema_file(file_name, default_data)
    with open(file_path, encoding="utf-8") as f:
        data = json.load(f)
    return data or default_data

validate_schema_entry(entry: dict, required_keys: list[str], file_name: str) -> None

Validate that a schema entry has required keys.

This function checks if a schema entry contains all required keys. If any required keys are missing, it raises a ValueError with details.

Parameters:

Name Type Description Default
entry dict

Schema entry to validate

required
required_keys list[str]

List of required keys

required
file_name str

Name of the schema file for error reporting

required

Raises:

Type Description
ValueError

If required keys are missing

Example

entry = {"name": "MyType"} validate_schema_entry(entry, ["name", "description"], "test.json") Traceback (most recent call last): ValueError: Missing keys in test.json entry: ['description'] → {'name': 'MyType'}

Source code in backend/core/schema_ops.py
def validate_schema_entry(entry: dict, required_keys: list[str], file_name: str) -> None:
    """
    Validate that a schema entry has required keys.

    This function checks if a schema entry contains all required keys.
    If any required keys are missing, it raises a ValueError with details.

    Args:
        entry (dict): Schema entry to validate
        required_keys (list[str]): List of required keys
        file_name (str): Name of the schema file for error reporting

    Raises:
        ValueError: If required keys are missing

    Example:
        >>> entry = {"name": "MyType"}
        >>> validate_schema_entry(entry, ["name", "description"], "test.json")
        Traceback (most recent call last):
        ValueError: Missing keys in test.json entry: ['description'] → {'name': 'MyType'}
    """
    missing = [key for key in required_keys if key not in entry]
    if missing:
        raise ValueError(f"Missing keys in {file_name} entry: {missing} → {entry}")

save_schema(file_name, data: list[dict])

Save schema data to file with proper formatting.

This function saves schema data to a file with proper key ordering and validation. It determines the appropriate key order based on the file name and validates each entry before saving.

Parameters:

Name Type Description Default
file_name str

Name of the schema file

required
data list[dict]

List of schema entries to save

required
Example

data = [{"name": "MyType", "description": "A type"}] save_schema("attribute_types.json", data)

Source code in backend/core/schema_ops.py
def save_schema(file_name, data: list[dict]):
    """
    Save schema data to file with proper formatting.

    This function saves schema data to a file with proper key ordering
    and validation. It determines the appropriate key order based on
    the file name and validates each entry before saving.

    Args:
        file_name (str): Name of the schema file
        data (list[dict]): List of schema entries to save

    Example:
        >>> data = [{"name": "MyType", "description": "A type"}]
        >>> save_schema("attribute_types.json", data)
    """
    file_path = os.path.join(GLOBAL_SCHEMA_PATH, file_name)

    # Choose key order based on file
    file_str = str(file_name)
    if "attribute" in file_str:
        key_order = ATTRIBUTE_TYPE_KEYS
    elif "relation" in file_str:
        key_order = RELATION_TYPE_KEYS
    elif "node" in file_str:
        key_order = NODE_TYPE_KEYS
    else:
        key_order = []


    formatted = []
    for entry in sorted(data, key=lambda x: x.get("name", "")):
        validate_schema_entry(entry, key_order, file_name)
        formatted.append(ordered_schema_dict(entry, key_order))

    with open(file_path, "w", encoding="utf-8") as f:
        json.dump(formatted, f, indent=2)

load_schema_json(file_name: str, default_data: list)

Load JSON schema with default data creation.

This function loads JSON schema data from a file. If the file is empty or contains None, it writes the default data and returns it.

Parameters:

Name Type Description Default
file_name str

Name of the schema file

required
default_data list

Default data to use if file is empty

required

Returns:

Name Type Description
list

Schema data from file or default data

Example

schema = load_schema_json("relation_types.json", []) isinstance(schema, list) True

Source code in backend/core/schema_ops.py
def load_schema_json(file_name: str, default_data: list):
    """
    Load JSON schema with default data creation.

    This function loads JSON schema data from a file. If the file is empty
    or contains None, it writes the default data and returns it.

    Args:
        file_name (str): Name of the schema file
        default_data (list): Default data to use if file is empty

    Returns:
        list: Schema data from file or default data

    Example:
        >>> schema = load_schema_json("relation_types.json", [])
        >>> isinstance(schema, list)
        True
    """
    file_path = ensure_schema_file(file_name, default_data)
    with open(file_path, encoding="utf-8") as f:
        data = json.load(f)
    if data is None:
        with open(file_path, "w", encoding="utf-8") as f:
            json.dump(default_data, f, indent=2)
        return default_data
    return data

create_attribute_type_from_dict(data: dict)

Create attribute type from dictionary data.

This function creates a new attribute type from dictionary data and adds it to the attribute types schema. If an attribute type with the same name already exists, the function returns without making changes.

Parameters:

Name Type Description Default
data dict

Attribute type data with keys: name, data_type, unit, applicable_classes

required
Example

data = { ... "name": "mass", ... "data_type": "float", ... "unit": "kg", ... "applicable_classes": ["atom", "molecule"] ... } create_attribute_type_from_dict(data)

Source code in backend/core/schema_ops.py
def create_attribute_type_from_dict(data: dict):
    """
    Create attribute type from dictionary data.

    This function creates a new attribute type from dictionary data and adds it
    to the attribute types schema. If an attribute type with the same name
    already exists, the function returns without making changes.

    Args:
        data (dict): Attribute type data with keys: name, data_type, unit, applicable_classes

    Example:
        >>> data = {
        ...     "name": "mass",
        ...     "data_type": "float",
        ...     "unit": "kg",
        ...     "applicable_classes": ["atom", "molecule"]
        ... }
        >>> create_attribute_type_from_dict(data)
    """
    attr_types = load_schema("attribute_types.json", default_data=[])
    existing_names = {a["name"] for a in attr_types}
    if data["name"] in existing_names:
        return  # or raise or skip silently

    attr_types.append(OrderedDict([
        ("name", data["name"]),
        ("data_type", data["data_type"]),
        ("unit", data["unit"]),
        ("applicable_classes", data["applicable_classes"]),
    ]))
    save_schema("attribute_types.json", attr_types)

create_relation_type_from_dict(data: dict)

Create relation type from dictionary data.

This function creates a new relation type from dictionary data and adds it to the relation types schema. If a relation type with the same name already exists, the function returns without making changes.

Parameters:

Name Type Description Default
data dict

Relation type data with keys: name, inverse, domain, range

required
Example

data = { ... "name": "bonds_with", ... "inverse": "bonded_by", ... "domain": "atom", ... "range": "atom" ... } create_relation_type_from_dict(data)

Source code in backend/core/schema_ops.py
def create_relation_type_from_dict(data: dict):
    """
    Create relation type from dictionary data.

    This function creates a new relation type from dictionary data and adds it
    to the relation types schema. If a relation type with the same name
    already exists, the function returns without making changes.

    Args:
        data (dict): Relation type data with keys: name, inverse, domain, range

    Example:
        >>> data = {
        ...     "name": "bonds_with",
        ...     "inverse": "bonded_by",
        ...     "domain": "atom",
        ...     "range": "atom"
        ... }
        >>> create_relation_type_from_dict(data)
    """
    rel_types = load_schema("relation_types.json", default_data=[])
    existing_names = {r["name"] for r in rel_types}
    if data["name"] in existing_names:
        return

    rel_types.append(OrderedDict([
        ("name", data["name"]),
        ("inverse", data["inverse"]),
        ("domain", data["domain"]),
        ("range", data["range"]),
    ]))
    save_schema("relation_types.json", rel_types)

parse_cnl_block(block: str) -> list[dict]

Parse CNL block for schema definitions.

This function parses a Controlled Natural Language (CNL) block to extract schema definitions for attributes and relations. It supports the following CNL patterns: - "define attribute 'name' as a type with unit 'unit' applicable to classes." - "define relation 'name' with inverse 'inverse' between 'domain' and 'range'."

Parameters:

Name Type Description Default
block str

CNL block containing schema definitions

required

Returns:

Type Description
list[dict]

list[dict]: List of parsed schema statements

Example

cnl = ''' ... define attribute 'mass' as a float with unit 'kg' applicable to atom, molecule. ... define relation 'bonds_with' with inverse 'bonded_by' between 'atom' and 'atom'. ... ''' statements = parse_cnl_block(cnl) len(statements) 2

Source code in backend/core/schema_ops.py
def parse_cnl_block(block: str) -> list[dict]:
    """
    Parse CNL block for schema definitions.

    This function parses a Controlled Natural Language (CNL) block to extract
    schema definitions for attributes and relations. It supports the following
    CNL patterns:
    - "define attribute 'name' as a type with unit 'unit' applicable to classes."
    - "define relation 'name' with inverse 'inverse' between 'domain' and 'range'."

    Args:
        block (str): CNL block containing schema definitions

    Returns:
        list[dict]: List of parsed schema statements

    Example:
        >>> cnl = '''
        ... define attribute 'mass' as a float with unit 'kg' applicable to atom, molecule.
        ... define relation 'bonds_with' with inverse 'bonded_by' between 'atom' and 'atom'.
        ... '''
        >>> statements = parse_cnl_block(cnl)
        >>> len(statements)
        2
    """
    lines = block.strip().splitlines()
    statements = []
    for line in lines:
        line = line.strip()

        # --- Define attribute ---
        if line.lower().startswith("define attribute"):
            m = re.match(
                r"define attribute '(.+?)' as a (\w+)(?: with unit '(.+?)')?(?: applicable to (.+?))?\.", line)
            if m:
                name, data_type, unit, classes = m.groups()
                statements.append({
                    "type": "define_attribute",
                    "name": name,
                    "data_type": data_type,
                    "unit": unit or "",
                    "applicable_classes": [c.strip(" '") for c in classes.split(",")] if classes else []
                })

        # --- Define relation ---
        elif line.lower().startswith("define relation"):
            m = re.match(
                r"define relation '(.+?)' with inverse '(.+?)'(?: between '(.+?)' and '(.+?)')?\.", line)
            if m:
                name, inverse, domain, range_ = m.groups()
                statements.append({
                    "type": "define_relation",
                    "name": name,
                    "inverse": inverse,
                    "domain": domain,
                    "range": range_,
                })

        # [existing parsing continues...]
    return statements

filter_used_schema(parsed_json_path, relation_schema_path, attribute_schema_path, output_path)

Filters only the used relation and attribute types from the global schema and writes them into used_schema.json.

This function analyzes a parsed graph to identify which relation and attribute types are actually used, then creates a filtered schema containing only those types. This is useful for creating lightweight schemas for specific graphs.

Parameters:

Name Type Description Default
parsed_json_path str

Path to the parsed graph JSON file

required
relation_schema_path str

Path to the global relation schema file

required
attribute_schema_path str

Path to the global attribute schema file

required
output_path str

Path where the filtered schema will be written

required

Returns:

Name Type Description
dict

The filtered schema containing only used types

Example

filter_used_schema( ... "parsed_graph.json", ... "relation_types.json", ... "attribute_types.json", ... "used_schema.json" ... ) {'relation_types': [...], 'attribute_types': [...]}

Source code in backend/core/schema_ops.py
def filter_used_schema(parsed_json_path, relation_schema_path, attribute_schema_path, output_path):
    """
    Filters only the used relation and attribute types from the global schema
    and writes them into used_schema.json.

    This function analyzes a parsed graph to identify which relation and attribute
    types are actually used, then creates a filtered schema containing only those
    types. This is useful for creating lightweight schemas for specific graphs.

    Args:
        parsed_json_path (str): Path to the parsed graph JSON file
        relation_schema_path (str): Path to the global relation schema file
        attribute_schema_path (str): Path to the global attribute schema file
        output_path (str): Path where the filtered schema will be written

    Returns:
        dict: The filtered schema containing only used types

    Example:
        >>> filter_used_schema(
        ...     "parsed_graph.json",
        ...     "relation_types.json", 
        ...     "attribute_types.json",
        ...     "used_schema.json"
        ... )
        {'relation_types': [...], 'attribute_types': [...]}
    """
    # Load parsed graph
    with open(parsed_json_path, 'r') as f:
        parsed_data = json.load(f)

    # Collect used relation and attribute names
    used_relation_names = set()
    used_attribute_names = set()

    for node in parsed_data.get("nodes", []):
        for rel in node.get("relations", []):
            used_relation_names.add(rel["name"])
        for attr in node.get("attributes", []):
            used_attribute_names.add(attr["name"])

    # Load global schemas
    with open(relation_schema_path, 'r') as f:
        global_relations = json.load(f)
    with open(attribute_schema_path, 'r') as f:
        global_attributes = json.load(f)

    # Filter schemas
    used_relations = [r for r in global_relations if r["name"] in used_relation_names]
    used_attributes = [a for a in global_attributes if a["name"] in used_attribute_names]

    # Compose output
    used_schema = {
        "relation_types": used_relations,
        "attribute_types": used_attributes
    }

    # Write to file
    with open(output_path, "w") as f:
        json.dump(used_schema, f, indent=2, sort_keys=False)

    return used_schema