Interface ScalarFunction


public interface ScalarFunction
A Java-implemented scalar SQL function. Implementations declare their own name, signature, and volatility, and evaluate one input batch at a time.

Mirrors DataFusion's ScalarUDFImpl trait. Wrap an instance in a ScalarUdf and register it via SessionContext.registerUdf(ScalarUdf) to make it callable from SQL or DataFrame plans.

Implementations may be invoked concurrently by DataFusion on multiple worker threads. If the implementation carries mutable state, the implementation must synchronize it.

  • Method Summary

    Modifier and Type
    Method
    Description
    List<org.apache.arrow.vector.types.pojo.Field>
    Declared argument fields, in positional order.
    evaluate(org.apache.arrow.memory.BufferAllocator allocator, ScalarFunctionArgs args)
    Compute the function result for one input batch.
    SQL name under which this function is invoked (e.g., "add_one").
    org.apache.arrow.vector.types.pojo.Field
    Declared return field.
    Volatility classification.
  • Method Details

    • name

      String name()
      SQL name under which this function is invoked (e.g., "add_one").
    • argFields

      List<org.apache.arrow.vector.types.pojo.Field> argFields()
      Declared argument fields, in positional order. The function is registered with an exact signature; calls whose argument types do not match exactly are rejected.

      Each entry is an Arrow Field -- a name plus a FieldType plus an optional list of child fields. Use Field.nullable(String, org.apache.arrow.vector.types.pojo.ArrowType) for primitive types (e.g. Field.nullable("arg0", new ArrowType.Int(32, true))). Nested types like List, Struct, and Map require the children list to carry element / member / key / value type information; constructing a Field via new Field(name, FieldType, children) is the canonical Arrow way to do that.

    • returnField

      org.apache.arrow.vector.types.pojo.Field returnField()
      Declared return field. The returned ColumnarValue's vector must have this exact type, including any nested children. Same construction rules as argFields().
    • volatility

      Volatility volatility()
      Volatility classification. Use Volatility.IMMUTABLE for pure functions, Volatility.STABLE for functions deterministic within a query, and Volatility.VOLATILE for non-deterministic functions.
    • evaluate

      ColumnarValue evaluate(org.apache.arrow.memory.BufferAllocator allocator, ScalarFunctionArgs args)
      Compute the function result for one input batch.
      Parameters:
      allocator - the BufferAllocator that MUST be used for any new Arrow vector allocation, including the result. Buffers allocated from other allocators will not survive the JNI handoff.
      args - the per-arg ColumnarValues and the batch row count. Each ColumnarValue is a read-only view; the implementation must NOT close its underlying vector.
      Returns:
      a ColumnarValue of the declared return type. If ColumnarValue.Array, the underlying vector must have length args.rowCount(); if ColumnarValue.Scalar, length 1. Ownership of the returned vector transfers to the framework; the implementation must NOT close it.