postgresql - Write spark dataframe to postgres Database -
the spark cluster setting follows:
conf['sparkconfiguration'] = sparkconf() \ .setmaster('yarn-client') \ .setappname("test") \ .set("spark.executor.memory", "20g") \ .set("spark.driver.maxresultsize", "20g") \ .set("spark.executor.instances", "20")\ .set("spark.executor.cores", "3") \ .set("spark.memory.fraction", "0.2") \ .set("user", "test_user") \ .set("spark.executor.extraclasspath", "/usr/share/java/postgresql-jdbc3.jar")
when try write dataframe postgres db using following code:
from pyspark.sql import dataframewriter my_writer = dataframewriter(df) url_connect = "jdbc:postgresql://198.123.43.24:1234" table = "test_result" mode = "overwrite" properties = {"user":"postgres", "password":"password"} my_writer.jdbc(url_connect, table, mode, properties)
i encounter below error:
py4jjavaerror: error occurred while calling o1120.jdbc. :java.sql.sqlexception: no suitable driver @ java.sql.drivermanager.getdriver(drivermanager.java:278) @ org.apache.spark.sql.execution.datasources.jdbc.jdbcutils$$anonfun$2.apply(jdbcutils.scala:50) @ org.apache.spark.sql.execution.datasources.jdbc.jdbcutils$$anonfun$2.apply(jdbcutils.scala:50) @ scala.option.getorelse(option.scala:120) @ org.apache.spark.sql.execution.datasources.jdbc.jdbcutils$.createconnectionfactory(jdbcutils.scala:49) @ org.apache.spark.sql.dataframewriter.jdbc(dataframewriter.scala:278) @ sun.reflect.nativemethodaccessorimpl.invoke0(native method) @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:57) @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43) @ java.lang.reflect.method.invoke(method.java:606) @ py4j.reflection.methodinvoker.invoke(methodinvoker.java:231) @ py4j.reflection.reflectionengine.invoke(reflectionengine.java:381) @ py4j.gateway.invoke(gateway.java:259) @ py4j.commands.abstractcommand.invokemethod(abstractcommand.java:133) @ py4j.commands.callcommand.execute(callcommand.java:79) @ py4j.gatewayconnection.run(gatewayconnection.java:209) @ java.lang.thread.run(thread.java:745)
can provide suggestions on this? thank you!
have downloaded postgresql jdbc driver? download here: https://jdbc.postgresql.org/download.html.
for pyspark shell use spark_classpath environment variable:
$ export spark_classpath=/path/to/downloaded/jar $ pyspark
for submitting script via spark-submit use --driver-class-path flag:
$ spark-submit --driver-class-path /path/to/downloaded/jar script.py
Comments
Post a Comment