postgresql - Write spark dataframe to postgres Database -


the spark cluster setting follows:

conf['sparkconfiguration'] = sparkconf() \ .setmaster('yarn-client') \ .setappname("test") \ .set("spark.executor.memory", "20g") \ .set("spark.driver.maxresultsize", "20g") \ .set("spark.executor.instances", "20")\ .set("spark.executor.cores", "3") \ .set("spark.memory.fraction", "0.2") \ .set("user", "test_user") \ .set("spark.executor.extraclasspath", "/usr/share/java/postgresql-jdbc3.jar") 

when try write dataframe postgres db using following code:

from pyspark.sql import dataframewriter my_writer = dataframewriter(df)  url_connect = "jdbc:postgresql://198.123.43.24:1234" table = "test_result" mode = "overwrite" properties = {"user":"postgres", "password":"password"}  my_writer.jdbc(url_connect, table, mode, properties) 

i encounter below error:

py4jjavaerror: error occurred while calling o1120.jdbc.    :java.sql.sqlexception: no suitable driver     @ java.sql.drivermanager.getdriver(drivermanager.java:278) @ org.apache.spark.sql.execution.datasources.jdbc.jdbcutils$$anonfun$2.apply(jdbcutils.scala:50) @ org.apache.spark.sql.execution.datasources.jdbc.jdbcutils$$anonfun$2.apply(jdbcutils.scala:50) @ scala.option.getorelse(option.scala:120) @ org.apache.spark.sql.execution.datasources.jdbc.jdbcutils$.createconnectionfactory(jdbcutils.scala:49) @ org.apache.spark.sql.dataframewriter.jdbc(dataframewriter.scala:278) @ sun.reflect.nativemethodaccessorimpl.invoke0(native method) @ sun.reflect.nativemethodaccessorimpl.invoke(nativemethodaccessorimpl.java:57) @ sun.reflect.delegatingmethodaccessorimpl.invoke(delegatingmethodaccessorimpl.java:43) @ java.lang.reflect.method.invoke(method.java:606) @ py4j.reflection.methodinvoker.invoke(methodinvoker.java:231) @ py4j.reflection.reflectionengine.invoke(reflectionengine.java:381) @ py4j.gateway.invoke(gateway.java:259) @ py4j.commands.abstractcommand.invokemethod(abstractcommand.java:133) @ py4j.commands.callcommand.execute(callcommand.java:79) @ py4j.gatewayconnection.run(gatewayconnection.java:209) @ java.lang.thread.run(thread.java:745) 

can provide suggestions on this? thank you!

have downloaded postgresql jdbc driver? download here: https://jdbc.postgresql.org/download.html.

for pyspark shell use spark_classpath environment variable:

$ export spark_classpath=/path/to/downloaded/jar $ pyspark 

for submitting script via spark-submit use --driver-class-path flag:

$ spark-submit --driver-class-path /path/to/downloaded/jar script.py 

Comments

Popular posts from this blog

Spring Boot + JPA + Hibernate: Unable to locate persister -

go - Golang: panic: runtime error: invalid memory address or nil pointer dereference using bufio.Scanner -

c - double free or corruption (fasttop) -