class Table:
| Class | Batch |
The batch interface provided for a hbspark.table.Table. |
| Method | __init__ |
Instantiates a new table object with a given table name. |
| Method | batch |
Retrieve the batch processor of the table which allows for bulk data modification. |
| Method | cell |
Retrieve the cell value (and it's hisotry) from the HBase table. |
| Method | delete |
Delete a row from the HBase table. |
| Method | families |
Gets all of the column families associated with the HBase tables. |
| Method | put |
Insert a new row into the HBase table. |
| Method | regions |
Provides all of the regions associated with a table (between the keys). |
| Method | row |
Get a row from the HBase. |
| Method | scan |
Retrieve all of the rows inside the HBase table. |
| Instance Variable | _table_ref |
Undocumented |
| Parameters | |
| name:string | The name for the table to be created. |
| Returns | |
hbspark.table.Table | A new instance of the HBase table. |
| Parameters | |
| timestamp:int | The timestame all batch commands should utilize. |
| batch_size:int | The queue length for the batch process before commands should send automatically. |
| transaction:bool | Whether or not the batch should behave like a transaction (for the purposes of a context manager). |
| wal:bool | Whether to write to the WAL |
| Returns | |
hbspark.table.Table.Batch | The batch processor for the current table. |
| Parameters | |
| rowkey:string | The rowkey for the target cell. |
| column:string | The column name for the target cell. |
| versions:int | The maximum numbers of cell versions to be retrieved. |
| timestamp:int | The new timestamp for the retreival. (VF) |
| include_timestamp:bool | Whether or not to include the timestamp in the retreival. (VF) |
| Returns | |
| list of pyspark.sql.Row | List of each retrieved row from the table. |
Delete a row from the HBase table.
The columns payload should have the following structure:
columns = ["cf_x:col_x", ...]
| Parameters | |
| rowkey:string | The rowkey targeting the row to be deleted. |
| columns:list | The list of column names to be deleted of the form cf:col. |
| timestamp:int | The timestamp for the deletion operation. |
| wal:bool | Whether or not to insert into the WAL for HBase |
| Returns | |
| None | Method does not return. |
| Returns | |
| list | A list of dictionaries representing each column family in the table and it's configuration. |
Insert a new row into the HBase table.
The data payload should have the following structure:
data = {
"cf_x:col_x" : "value",
...
}
| Parameters | |
| rowkey:string | The rowkey for the new inserted row. |
| data:dict | The dictionary mapping cf:col to values to be stored. |
| timestamp:int | The timestamp used for the put operation (VF). |
| wal:bool | Whether or not to write to the WAL of HBase. |
| Returns | |
| None | Method does not return. |
| Returns | |
| list | A list of dictionaries representing a region and it's configuration. |
| Parameters | |
| rowkey:string | The rowkey for the provided row. |
| columns:list of string | The column names which should be retrieved from the row. |
| timestamp:int | The new timestamp for the retreival. (VF) |
| include_timestamp:bool | Whether or not to include the timestamp in the retreival. (VF) |
| Returns | |
| pyspark.sql.Row | The row as a spark manageable data structure. |
| Parameters | |
| schema:StructType | A list of StructField with ("cf:name", Type(), True) |
| row_start:string | Beginning rowkey of the scan (inclusive). |
| row_stop:string | Ending rowkey of the scan (exclusive) |
| row_prefix:string | A prefix rowkeys must match. |
| columns:list or tuple | The columns that should be returned for each row. |
| filter:string | A string to filter out results (VF) |
| timestamp:int | The timestamp for the scan. |
| include_timestamp:int | Whether row timestamps are returned. |
| batch_size:int | The max size for a single return of retrieving results. |
| scan_batching:bool | Whether or not the server will return by batching. |
| limit:int | Maximum number of total returned rows |
| sorted_columns:bool | Whether to return the sorted columns or not. |
| reverse:bool | Whether to perform scans in reverse of natural order. |
| Returns | |
| pyspark.sql.DataFrame | A dataframe that consists of all the rows in the HBase table. |