hbspark.table.Table

Class	`Batch`	The batch interface provided for a `hbspark.table.Table`.
Method	`__init__`	Instantiates a new table object with a given table name.
Method	`batch`	Retrieve the batch processor of the table which allows for bulk data modification.
Method	`cell`	Retrieve the cell value (and it's hisotry) from the HBase table.
Method	`delete`	Delete a row from the HBase table.
Method	`families`	Gets all of the column families associated with the HBase tables.
Method	`put`	Insert a new row into the HBase table.
Method	`regions`	Provides all of the regions associated with a table (between the keys).
Method	`row`	Get a row from the HBase.
Method	`scan`	Retrieve all of the rows inside the HBase table.
Instance Variable	`_table_ref`	Undocumented

def __init__(self, name):

Instantiates a new table object with a given table name.

Parameters
name:string	The name for the table to be created.
Returns
`hbspark.table.Table`	A new instance of the HBase table.

def batch(self, timestamp=None, batch_size=None, transaction=False, wal=True):

Retrieve the batch processor of the table which allows for bulk data modification.

Parameters
timestamp:int	The timestame all batch commands should utilize.
batch_size:int	The queue length for the batch process before commands should `send` automatically.
transaction:bool	Whether or not the batch should behave like a transaction (for the purposes of a context manager).
wal:bool	Whether to write to the WAL
Returns
`hbspark.table.Table.Batch`	The batch processor for the current table.

def cell(self, rowkey, column, versions=None, timestamp=None, include_timestamp=False):

Retrieve the cell value (and it's hisotry) from the HBase table.

Parameters
rowkey:string	The rowkey for the target cell.
column:string	The column name for the target cell.
versions:int	The maximum numbers of cell versions to be retrieved.
timestamp:int	The new timestamp for the retreival. (VF)
include_timestamp:bool	Whether or not to include the timestamp in the retreival. (VF)
Returns
`list` of pyspark.sql.Row	List of each retrieved row from the table.

def delete(self, rowkey, columns=None, timestamp=None, wal=True):

Delete a row from the HBase table.

The columns payload should have the following structure:

    columns = ["cf_x:col_x", ...]

Parameters
rowkey:string	The rowkey targeting the row to be deleted.
columns:list	The list of column names to be deleted of the form `cf:col`.
timestamp:int	The timestamp for the deletion operation.
wal:bool	Whether or not to insert into the WAL for HBase
Returns
None	Method does not return.

def families(self):

Gets all of the column families associated with the HBase tables.

Returns
list	A list of dictionaries representing each column family in the table and it's configuration.

def put(self, rowkey, data, timestamp=None, wal=True):

Insert a new row into the HBase table.

The data payload should have the following structure:

    data = {
        "cf_x:col_x" : "value",
        ...
    }

Parameters
rowkey:string	The rowkey for the new inserted row.
data:dict	The dictionary mapping `cf:col` to values to be stored.
timestamp:int	The timestamp used for the put operation (VF).
wal:bool	Whether or not to write to the WAL of HBase.
Returns
None	Method does not return.

def regions(self):

Provides all of the regions associated with a table (between the keys).

Returns
list	A list of dictionaries representing a region and it's configuration.

def row(self, rowkey, columns=None, timestamp=None, include_timestamp=False):

Get a row from the HBase.

Parameters
rowkey:string	The rowkey for the provided row.
columns:`list` of `string`	The column names which should be retrieved from the row.
timestamp:int	The new timestamp for the retreival. (VF)
include_timestamp:bool	Whether or not to include the timestamp in the retreival. (VF)
Returns
pyspark.sql.Row	The row as a spark manageable data structure.

def scan(self, schema=None, row_start=None, row_stop=None, row_prefix=None, columns=None, filter=None, timestamp=None, include_timestamp=False, batch_size=1000, scan_batching=None, limit=None, sorted_columns=False, reverse=False):

Retrieve all of the rows inside the HBase table.

Parameters
schema:StructType	A list of StructField with ("cf:name", Type(), True)
row_start:string	Beginning rowkey of the scan (inclusive).
row_stop:string	Ending rowkey of the scan (exclusive)
row_prefix:string	A prefix rowkeys must match.
columns:list or tuple	The columns that should be returned for each row.
filter:string	A string to filter out results (VF)
timestamp:int	The timestamp for the scan.
include_timestamp:int	Whether row timestamps are returned.
batch_size:int	The max size for a single return of retrieving results.
scan_batching:bool	Whether or not the server will return by batching.
limit:int	Maximum number of total returned rows
sorted_columns:bool	Whether to return the sorted columns or not.
reverse:bool	Whether to perform scans in reverse of natural order.
Returns
pyspark.sql.DataFrame	A dataframe that consists of all the rows in the HBase table.

_table_ref =

Undocumented