????????
????DistCp??????hadoop???????????????????λ??hadoop tools???У????????1300???У??????????????HDFS?????????????????DistCp?????????????????????????ù????????????????????????MR?????????????hdfs???????????????????
?????÷?
????DistCp??÷??????±??????
OPTIONS:
-p[rbugp]              Preserve status
r: replication number
b: block size
u: user
g: group
p: permission
-p alone is equivalent to -prbugp
-i                     Ignore failures
-log <logdir>          Write logs to <logdir>
-m <num_maps>          Maximum number of simultaneous copies
-overwrite             Overwrite destination
-update                Overwrite if src size different from dst size
-f <urilist_uri>       Use list at <urilist_uri> as src list
-filelimit <n>         Limit the total number of files to be <= n
-sizelimit <n>         Limit the total size to be <= n bytes
-delete                Delete the files existing in the dst but not in src
????????-p??-m??-overwrite??????ò?????????????????????????????????????????£????-p?????????????????????????ж?????-m???????????????-overwrite??????-delete???????????dst??src?????diff?????????-update???????????????????е?????????????С??????distcp???????????????С??£??????????distcp??????????????????????????
?????????????????
????DistCp?????org.apache.hadoop.util.Tool??????????????????????????????????????“int run(InputStream in?? OutputStream out?? OutputStream err??String... arguments);”???ToolRunner???????????С?
????DistCp???????????????????·????????????????????????setup??????
????private static void setup(Configuration conf?? JobConf jobConf??
????final Arguments args)
?????÷?????DistCp??????????????????????????????????????????????????????·??????????????Mapper???????????????????????“_distcp_src_files”??“_distcp_dst_files”???????????????SequenceFile????Key/Value???????л?????????????????????????????/???????б??????_distcp_src_files ??key????????size??????????????0??value?????????Writable?????FilePair???????????org.apache.hadoop.fs.FileStatus??·????_distcp_dst_files??key?????·?????????FileStatus?????????????DistCp???????????setup?????У?DistCp?????????????????????????????б???????????????????
???????DistCp????268435456????256MB????з??λ????map?????????????????-sizelimit???????????????DistCp???????????InputSplit????_distcp_src_files????????????????λ?????з??????趨??-m???????????ò????趨??map???????????з?????????????з??map????????????-m?????趨?????????????????????????????????????趨???