hbspark

package documentation

hbspark is an simple to use data pipepline, moving data stored inside HBase into pyspark for distributed computation using HBase's Thrift API.

Module	`table`	Holds all tabling functionality for hbspark, including creating, deletion, querying, and modifications.
Module	`_hb_session`	Undocumented
Module	`_utils`	A class holding common utility functions for hbspark

From __init__.py:

Function connect Connect the HBase hostname to the provided spark session.

def connect(hostname, spark_session):

Connect the HBase hostname to the provided spark session.

Parameters
hostname:string	The hostname or IP of the HBase thrift gateway.
spark_session:pyspark.sql.SparkSession	The instantiated spark session used to create dataframes.
Returns
None	Method has no return