Hadoop shell command VS Java APIs -
i faced performance issue while writing utility delete files , directories hdfs iteratively within loop (was invoking shell command shell script) older week. had checks performed on each file/sub-directory before deleting, not delete entire root directory. each delete, shell command taking around 2 sec. if had delete 1600 files , subdirectories, taking close 1 hr accomplish them. when instead used java apis (filestatus , filesystem), got drastic performance gain. completed under 5 secs. know in detail fundamental reason behind this. read shell commands (like -rm) intrenally use java apis sole reason huge difference in response time ?
well invoking shell command iterative, have jvm startup time contend each iteration, while api approach have 1 jvm startup. along these lines have connect name node multiple times etc.
Comments
Post a Comment